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CHAPTER  1 


INTRODUCTION 


1. 1  MOTIVATION 

The  demand  Cor  Increasing  computational  power  continually 
outruns  the  increasing  capabilities  of  sequential  computers.  This 
fact  has  long  been  recognised,  and  models  for  parallel  computation 
have  been  developed  to  allow  a  problem  to  be  decomposed  and  solved 
concurrently  by  sore  than  one  processing  unit.  However, 
decomposition  of  a  problem  into  independent  subunits  communicating 
through  well-defined  protocols  has  been  difficult  to  achieve  in 
the  normal  sequential  memory-processor  paradigm  of  computation. 
Processes  must  communicate  by  changing  portions  of  a  common  state. 
Synchronisation,  mutual  exclusion,  and  variable  sharing  require 
significant  effort  on  the  part  of  the  programmer.  P aucal lei 
programs  using  such  directives  as  PORK,  JOIN,  and  mutual  exclusion 
variables  are  more  complex  than  sequential  programs,  and  more 


difficult  to  verify  correct. 


ox  environment  Which  influences  the  result.  A  given  set  of  inputs 
to  the  computation  always  result  in  the  same  output.  These 
characteristics  give  data  flew  languages  a  functional  semantics. 

Nonprocedural  languages,  although  not  defined  specifically 
for  data  flew,  seem  particularly  appropriate  as  data  flow 
languages  [Acke79,  Ashc79].  First,  they  satisfy  the  properties  of 
single  assignment  and  referential  transparency  which  data  flow 
languages  have.  In  addition,  they  support  a  very  high  level, 
concise  form  of  problem  description.  In  Model,  for  example,  both 
recurrence  relations  and  familiar  array  operations  cam  be 
expressed  using  subscripted  arrays.  In  contrast,  these  tasks 
require  loop  construction  in  conventional  high  level  languages  and 
in  data  flew  languages.  Thus,  with  a  nonprocedural  language,  more 
of  the  problem  can  be  expressed  in  the  problem  domain  rather  than 
in  the  programming  domain.  The  specification  is  more 
self-document ing  than  a  conventional  program;  can  be  written  more 
quickly  than  a  conventional  program;  and  can  be  sore  easily 
changed  than  a  conventional  program-  all  these  benefits  because 
the  problem  rather  than  the  implementation  is  described.  Another 
advantage  to  a  nonprocedural  language  such  as  Model  is  in  the  area 
of  data  memory  usage.  The  user  cam  describe  data  structures  in  a 
form  most  convenient  to  the  problem.  The  translator  can  then 
generate  from  that  description  a  data  structure  most  efficient  to 


the  implementation. 


To  illustrate  the  desirability  of  using  a  nonprocedural 
language  rather  than  a  lower  level  language  to  specify  a  problem  , 
consider  the  nonprocedural  specification  of  figure  1.1,  a  matrix 
multiply  program.  The  longest  part  of  the  program  is  the  data 
declaration.  There  are  only  two  assertions.  The  first  assertion 
computes  a  three  dimensional  array  X,  obtained  by  multiplying  the 
appropriate  rows  and  co limns  of  the  input  matrices  A  and  B.  The 
second  assertion  uses  a  reduction  function  SUM  to  add  together 
elements  of  the  innermost  dimension  of  X  and  produce  the  matrix 


product  C. 


MDOUIE:  MM; 


SOURCE:  INPILE1,  INFILE2 ;  /*  input  files  */ 

TARGET:  OUTPILE;  /*  output  file  */ 

INPILE1  IS  PILE  (INREC1);  /*  Matrix  A  */ 
INREC1  IS  RECORD  (INl(lO)); 

INI  IS  GROUP  (  A(  10)  ); 

A  IS  PIELO  (NUMERIC)} 

INPILE2  IS  PILE  ( INREC2 );  /*  Matrix  B  */ 

IMRBC2  IS  RECORD  (IN2(10))} 

IN2  IS  GROUP  (  B(  10)  ); 

B  IS  PIELD  (NUMERIC); 

OUTPIIE  IS  PIIE  (OUTREC);  /*  Matrix  C  V 
OUTRBC  IS  RECORD  (OUTl(lO)); 

OUT1  IS  GROUP  (C( 10 ) ); 

C  IS  PIEU>  (NUMERIC); 

X  IS  PIEU)  (NUMERIC);  /*  temporary  */ 

I  IS  SUBSCRIPT  (lO); 

J  IS  SUBSCRIPT  (10); 

K  IS  SUBSCRIPT  (10); 

X(  I, J,K)  -  A(I,K)  *  B(K,J);  /*  Assertion  1  */ 
C(I.J)  -  SUM(X(I,J,K),K);  /*  Assertion  2  */ 


Pigure  1.1  Matrix  Multiply  in  Model 

In  contrast,  the  corresponding  prograa  in  the  lower  level  Id 
language  contains  two  procedures  and  a  three  level  nested  loop, 
the  call 

aat(a,  transposed,  a,  n),  1,  a,  n) 
returns  the  product  of  the  1  by  a  aatrix  a  and  the  a  by  n  matrix 
b.  the  exaaple,  taken  from  (Gost80),  is  shown  in  Pigure  1.2. 


procedure  transpose(b,a.n) 

(initial  trauma <-LAMBOA 
for  i  from  l  to  n  do 

new  trana< -append ( trans, i,  ( 
initial  row<-IAMBC* 
for  j  from  1  to  a  do 

new  row<-append(  row, -j  ,b[j , i) ) 
return  row)) 
return  trans); 

procedure  smt(a,  bt,  1,  a,  n ) 

(initial  c<~LAMBO* 
for  i  froa  1  to  1  do 
rcwa<-afi] j 
new  c<-append(c,i,( 

initial  rtwc<-UMBM 
for  j  froa  1  to  n  do 
colb<-bt[  j)j 

new  rowc<-append( rowc, j , ( 
initial  innerprod  <-0 
for  k  froa  1  to  a  do 

new  i(merprod<-innerprod+rowa(k]*colb(lc] 
return  innerprod)) 
return  rowc) ) 
return  c) 

aat(a.  transpose(b,a, n),  l,a, n); 


Figure  1.2  Matrix  Multiply  in  Id 


The  Model  specification  is  concerned  with  the  problem  domain 
rather  than  the  iapleaentation  domain.  in  contrast,  one  must 
understand  clearly  the  semantics  of  the  data  flaw  machine  to  be 


able  to  program  in  the  lower  level  language 


1.2  CONTRIBUTIONS 


The  njor  contribution  of  this  research  is  the  development  of 
a  system  which  translates  a  program  specification  in  a  very  high 
level  nonprocedural  language  to  a  lower  level  data  flow  language. 

In  the  process  of  specification  analysis  and  translation,  a 
problem  description  goes  through  the  following  transformations: 

1.  the  problem  is  defined  in  the  Model  language  to  form  a 
specification. 

2.  The  specification  is  analyzed  by  the  Model  Processor,  and 
a  form  of  data  dependency  graph  is  created,  the  array  orach. 

3.  The  array  graph  is  processed  to  yield  a  data  flow  program 
template. 

4.  The  template  is  translated  into  the  MaD  data  flow 
language. 

5.  The  MaD  program  is  compiled  to  produce  machine  code. 

6.  The  code  Is  run  on  the  data  flew  machine  to  produce  the 
problem  solution. 


These  transformations  are  outlined  in  Figure  1.3. 
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Plgure  1.3  Prom  Problem  Specification  to  Solution 

The  nonprocedural  language  Model  has  been  used  as  the  source 
language  for  translation.  Parts  of  the  existing  Model  system, 
which  generates  a  PL/ 1  program  from  the  specification,  have  been 
used,  these  programs  create  the  data  dependency  graph  and  perform 
semantic  checking .  the  graph  is  input  to  the  data  flow  subsystem, 
thidt  partitions  the  graph  to  produce  data  flow  program  units, 
the  program  units  are  called  blocks.  Two  types  of  blocks  are 
distinguished-  iterative  and  parallel.  Blocks  are  generated  as  a 


result  of  analysing  the  Maximally  Strongly  Connected  Components 
(KSCC's )  of  the  graph.  A  block  may  be  generated  either  from  a 
single  MSCC  or  from  a  larger  component  created  by  merging  K9CC*s. 
The  process  of  generating  iterative  and  parallel  blocks  from  a 
nonprocedural  specification  is  called  scheduling .  The  result  of 
scheduling  is  a  language-independent  intermediate  form  of  program, 
the  data  flew  template .  Using  practical  experience  with  the 
Manchester  machine,  as  well  as  a  comparative  analysis  of  several 
other  data  flow  machines,  criteria  by  which  to  merge  components  to 
produce  an  efficient  template  have  been  developed. 

This  research  studies  in  detail  one  data  flew  implementation, 
the  Manchester  machine,  and  a  data  flow  language  developed  for  the 
machine,  MaD.  Program  templates  generated  from  a  Model 
specification  are  translated  to  the  MaD  Progr leering  Language. 

1.3  ORGANIZATION  OP  THE  DISSERTATION 

Chapter  2  reviews  data  flow,  from  the  conceptual  model  to 
various  architectural  implementations ,  and  explores  problem  areas 
in  data  flow.  In  addition,  it  describes  the  architecture  of  the 
Manchester  University  data  flew  machine.  Chapter  3  discusses 
languages  for  data  flew  computers.  It  focuses  in  particular  on 
the  MaD  (Manchester  Data  flow)  language  and  on  the  Model 
Specification  Language .  In  Chapter  4,  Model  internals  are 
discussed!  the  array  graph,  subscript  processing,  and  the  notion 


of  r>ng>.  Chapters  5  and  6  describe  the  scheduling  and  storage 
allocation  algorithms.  Translation  of  the  data  flaw  template  to 
NaD  is  done  in  the  code  generation  phase  of  processing,  the  topic 
of  Chapter  7.  Chapter  8  susnarises  and  suggests  further  areas  of 
research  in  nonprocedural  languages  for  data  flaw.  Appendix  A 
contains  a  description  of  the  Kao  Progressing  Language  in  BNP 
fora,  and  Appendix  B,  a  description  of  Model.  Appendix  C  contains 
several  examples  in  Which  Model  specifications  are  translated  to 


CHAPTER  EZ 


OMR  F lOW 


In  this  chapter,  the  conceptual  fraawwork  and  sane  Machine 
realisations  of  data  £l<w  architecture  are  presented.  In  the  first 
section,  the  data  flew  eodel  is  contrasted  with  the  conventional 
sequential  eodel.  Next,  several  architectural  isplewentations  of  the 
eodel  are  reviewed.  The  next  section  discusses  problee  areas  for  data 
flow  lap lewentat ions .  Finally,  the  Manchester  data  flow  eachine  is 
described. 


II.  1  THE  DATA  FLOW  COMPUTATION  MODEL 

Data  flow  is  a  eodel  of  confutation  which  represents  an  algorithn 
as  a  directed  graph  shewing  data  dependencies.  In  the  graph,  each 
node  represents  an  instruction.  Tokens  (the  term  used  for  data) 
travel  on  the  directed  arcs  fron  the  node  which  produced  thee  to  nodes 


pis urc  a.  a  onrn  ploh  a««F»H  of-  thk  computhtion 


IE.  1.1  Properties  of  The  Data  Flew  Model 


The  example  illustrates  three  desirable  properties  of  the  data 
flow  Model.  First,  the  data  flow  graph  shows  parallelism  of  the 
computation  at  all  levels,  from  machine  op  level  up.  Second,  under 
the  data  flew  model,  there  is  no  concept  of  memory  address.  And 
third,  the  graph  displays  referential  transparency . 

IE. l.l.l  Parallelism  - 

Parallelism  of  an  algorithm  is  exposed  because  the  graph  is  a 
partial  order  rather  than  the  total  order  imposed  by  a  sequential 
machine.  Sequential  machine  instructions  are  executed  in  a  fixed 
order  according  to  a  single  program  counter.  Possible  parallelism  may 
be  Inferred  to  a  limited  extent  from  evaluation  of  the  algorithm. 
However  since  memory  cells  can  hold  different  values  at  different 
times,  the  potential  for  detection  of  parallelism  is  reduced.  If  the 
programmer  reuses  a  variable  for  more  than  one  purpose,  computation 
involving  the  second  usage,  even  if  it  is  independent  of  the  first, 
must  occur  after  the  first. 

With  the  data  flew  model,  asynchrony  and  parallelism  are 
implicit.  Each  computation  is  constrained  only  by  the  availability  of 
its  input  data.  The  programmer  does  not  have  to  specify  concurrent 
operation  of  procedures.  "Co-begin",  "fork”,  or  "activate  task” 
constructs  are  not  necessary.  Vie  data  dependency  graph 


representation  of  th  program  exposes  all  possible  parallelism. 


11. 1.1.2  Lack  Of  Address  - 

In  the  data  flow  model,  data  is  described  in  terms  of  values 
rather  than  addresses.  Inherent  in  the  conventional  model  is  the  idea 
of  an  address  [Arvi78] .  The  address  of  a  memory  cell  is  invariant, 
while  the  value  stored  in  the  cell  changes  with  time.  In  the  data 
flow  model,  there  is  no  address.  (We  will  see  later  that 
implementat ions  of  the  model  do  use  address  rather  than  data  to 
transmit  structured  values).  values,  or  tokens,  produced  by 
processors  travel  on  the  arcs  to  other  processors.  Data  values  input 
to  a  node  are  applied  to  the  computation  represented  by  the  node. 
Output  values  are  produced.  These  values  axe,  in  turn,  input  to  other 
nodes.  For  convenience,  the  tokens  are  usually  named.  Since  it  would 
be  ambiguous  for  one  name  to  refer  to  more  than  one  value,  data  flow 
programs  usually  follow  the  "single  assignment  rule".  One  name  can 
denote  only  one  value.  Multiple  assignment  such  as  in  the  statement 
I  I  +  1 

is  not  allowed  in  the  same  fashion  as  in  conventional  programs. 

11. 1.1. 3  Referential  Transparency  - 

The  third  property  of  data  flew  is  referential  transparency. 
This  property  implies  that  there  is  no  environment  or  context  or  side 
effect  to  influence  the  result  of  a  computation.  In  the  conventional 
model,  the  change  in  value  of  a  memory  cell  might  depend  on  some 
complex  interrelationship  of  the  current  values  of  other  memory  cells. 


Values  of  colls  change  in  time,  so  that  When  the  interrelationship  is 
tested  is  as  important  as  the  interrelationship  itself. 

Under  data  flow,  however,  any  parameter  Which  is  needed  by  the 
node  must  be  input  to  the  node.  The  input  values  uniquely  determine 
the  results  produced.  A  given  set  of  values  input  to  a  processor 
produce  the  same  results  regardless  of  when  the  computation  occurs,  so 
that  the  node  can  be  Characterised  as  a  mathematical  function.  The 
graph,  therefore,  displays  a  locality  of  effect  in  Which  "the 
mathematical  equations  for  a  data  flow  program  can  be  derived  simply 
by  conjoining  the  equations  for  the  various  parts  of  the  program  in  an 
'additive'  manner"  CKosi79]. 

These  properties  of  data  flow  graphs  make  the  semantics  of  data 
flow  programs  tractable  to  formal  description.  Being  able  to  describe 
precisely  the  "meaning"  of  language  statements  is  helpful  in  many 
respects.  It  is  then  possible  to  have  a  standard  against  which  to 
test  compilers  for  compliance  to  a  language  specification.  Having  a 
mathematically  precise  set  of  axioms  defining  a  language  makes  it 
possible  to  attempt  to  prove  theorems  about  the  behavior  of  programs. 
The  relative  ease  with  which  <Se  rotational  semantics  has  been  developed 
for  data  flow  languages  has  encouraged  research  into  suitable 
architectural  implementations  for  data  flew. 
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II. 2  ORGANIZATION  OF  DATA  FLOW  MACHINES 

Data  flow  aachines  have  been  built  or  proposed  by  various  groups. 
Texas  Instruments,  University  of  Utah,  the  CERT  Laboratory  in 
Toulouse,  and  university  of  Manchester  have  all  built  prototype 
system.  Dennis  and  Arvind  at  MIT  were  responsible  for  much  of  the 
ground  work  in  data  flow.  Each  is  building  a  prototype  machine.  In 
this  section  we  will  develop  concepts  cannon  to  many  of  the 
implementations . 

II. 2.1  The  Data  Flaw  Instruction 

The  node  of  the  data  flow  graph  is  realized  in  the  machine  as  an 
instruction.  A  data  flow  instruction  differs  somewhat  from  a 
conventional  machine  instruction.  In  addition  to  the  op  code,  the 
data  flow  instruction  must  carry  information  about  the  arcs-  that  is, 
the  inputs  and  outputs  to  the  instruction.  An  instruction  might  look 
as  follows: 

Opcode 

Number  of  Destinations 
for  each  Destination: 

node  number  of  the  Destination 

input  point  into  the  Destination  (which  input  arc?) 

Number  of  Inputs 
for  each  input: 
input  type 
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There  can,  of  course,  be  many  variations  to  this  format .  Space  may  be 
reserved  with  the  instruction  for  input  parameters  to  be  collected, 
the  Texas  Instruments  Distributed  Data  Processor  ( TI  DDP  ) 
instruction,  illustrated  in  Figure  2.4,  follows  this  convention, 
there  may  be  a  limitation  as  to  the  number  of  input  parameters  and/or 
destination  addresses.  The  Manchester  data  flow  machine  allows  a 


of  two  inputs  and  two  destinations. 


3 


•  I  DDP 


INSTRUCTION 


PROCESSING  UNIT  OP  THE  TI  DDP 


In  the  LAD  System,  Which  was  constructed  at  the  CERT  Laboratory 
in  Toulouse,  instruction  memory  has  control  bits  associated  with  each 
instruction.  A  processor  continually  scans  the  instruction  memory  for 
enabled  instructions . 

In  the  Manchester  machine  and  in  Arvind's  machine,  there  is  a 
Matching  Unit.  This  unit  assembles  the  parameters  to  an  instruction. 
When  all  parameters  have  arrived,  a  "group  packet"  consisting  of 
opcode,  input  par -neuters,  and  destination  address  is  sent  to  a 
processing  element  in  the  Processing  Unit.  A  block  diagram  of  the 
Manchester  Data  Plow  Machine  is  shown  in  Figure  2.6.  A  diagram  of 
Arvind’s  machine  is  shown  in  Figure  2.7.  The  Token  Queue  in  the 
Manchester  machine  corresponds  to  the  Input  Section  of  Arvind’s 
machine.  The  Hatching  Store  corresponds  to  the  Waiting-Matching 
Section.  The  Mode  Store  corresponds  to  the  Instruction  Fetch  Section 
and  Program  Memory.  The  Processing  Unit  corresponds  to  the  Service 
Section.  The  Switch  corresponds  to  the  Output  Section.  In  the 
Manchester  machine  the  function  of  Arvind's  Data  structure  Memory  is 
performed  in  the  Matching  Store. 


SECTION 


MIT  (ARVINO)  nflTflFLOW  MRCHINE 


II. 2. 3  Processor  Network  Topology 

It  is  necessary  to  provide  a  path  of  communication  between 
processors  of  a  data  flew  machine.  Many  architectures  specify  a 
two-level  hierarchy.  An  individual  processor  usually  consists  of  a 
local  instruction  memory,  a  local  token  memory,  and  one  or  more 
processing  elements.  The  higher  level  connects  the  processors.  The 
TI  DOP  may  have  up  to  four  processors  on  a  common  bus,  called  the  DCCM 
ring.  Data  produced  at  one  processor  and  used  by  another  processor  is 
routed  through  a  switch  at  the  sending  processor  (see  Figure  2.5). 
The  data  then  travels  along  the  DCLN  ring  to  the  destination 
processor,  and  is  routed  through  the  switch  at  the  destination 
processor  into  its  local  memory.  Arvind  is  designing  a  machine  having 
64  processors  in  the  network.  Data  produced  by  one  processor  and  used 
by  smother  must  travel  through  the  network.  Data  produced  by  a 
processor  and  used  within  the  same  processor  may  short-circuit  the 
network.  In  these  network  topologies,  each  processing  unit  can 
address  any  other  one  directly.  The  TI  and  Arvind  network  topologies 
are  illustrated  in  Figure  2.8  and  2.9  respectively. 


II. 3  PROBLQf  AREAS  IN  DATA  FLOW  DESIGN 


There  are  several  problem  areas  in  designing  a  data  flow  machine, 
we  will  discuss  three  topics  of  special  relevance  to  the  task  of 
scheduling  a  nonprocedural  specification  for  data  flow. 

Same  questions  of  concern  are: 

First,  can  more  than  one  token  be  on  an  arc  at  one  time?  Figure  2.11 
shows  an  instruction  X  with  two  inputs.  On  the  left  hand  input,  there 
is  one  token  waiting,  on  the  right  hand  input,  there  are  two  tokens. 
Hew  does  the  machine  recognize  that  input  A1  belongs  with  input  B1  to 
instruction  X  rather  than  with  input  B2? 

Second,  how  is  a  program  graph  to  be  partitioned  among  processors  in  a 
data  flew  machine?  What  properties  of  the  graph  should  be  considered 
in  generating  such  a  partition? 

Third,  hew  is  structured  data  handled  on  the  data  flow  machine?  What 
data  type  can  a  token  be-  cam  an  array  be  considered  one  token,  or  is 
a  token  constrained  to  be  of  elementary  type  such  as  integer  word? 
How  does  the  machine  handle  structured  data  such  am  am  array? 

We  will  examine  these  questions  and  find  how  various  data  flew  machine 


designs  handle  the  pr6bl< 


TOKEN  LABEL 


II. 3.1  Reentrancy  in  Graph 

the  first  question  concerns  reentrancy  of  graph.  If  an 
instruction  in  the  graph  can  have  more  than  one  set  of  input  existing 
at  one  time,  the  graph  is  reentrant.  When  such  a  condition  exists 
(several  sets  of  tokens  input  to  an  instruction),  there  must  be  some 
way  of  keeping  the  token  sets  distinct.  If  the  implementation  does 
not  support  reentrancy,  then  there  may  be  only  one  set  of  tokens 
active  at  one  time.  Only  when  that  set  has  been  processed  may  another 
set  be  input  to  the  node.  The  TI  OOP  and  LACJ  follow  this  strategy. 
In  the  TI  DDP,  input  values  to  a  node  are  stored  with  the  node.  When 
all  required  Inputs  h^ve  arrived,  the  instruction  (node)  is  linked 
into  an  enabled-instruction  list  to  be  executed  as  processor 
availability  permits.  Because  of  this  implementation,  the  graph  is 
not  inherently  reentrant.  If  a  portion  of  the  program  needs  to  be 
reentrant,  the  subgraph  corresponding  to  that  program  fragment  must  be 
duplicated  so  that  distinct  input  sets  nay  be  allocated  unique  storage 
areas. 

on  the  LAD  system,  the  compiler  vecognizes  an  EXPAND  directive 
which  duplicates  a  subgraph  a  programmer-specified  number  of  times, 
this  is  required  in  order  to  allow  parallel  computation  of  array 
elements,  for  example.  If  the  number  of  duplicate  copies  needed  is 
not  known  at  compile  time,  the  computation  must  proceed  iteratively 


The  Manchester  machine  and  Arvind's  machine.  Which  support 


reentrancy,  use  token  labeling  to  keep  sets  of  tokens  distinct. 
Tokens  which  belong  to  the  same  input  set  to  a  node  have  the  same 
label. 

Figure  2.12  shows  the  label  format  for  the  Manchester  machine.  A 
label  consists  of  an  index  and  a  color.  The  index  is  used  to 
distinguish  elements  of  an  array  when  multiple  array  elements  might  be 
on  the  same  input  arc  to  a  node.  The  color  portion  of  the  label 
permits  reuse  of  the  graph.  It  is  divided  into  am  activation  name, 
used  to  distinguish  concurrent  calls  to  a  function,  and  an  iteration 
level,  used  to  distinguish  concurrent  iteration  instances.  There  is 
flexibility  in  the  size  of  each  field.  The  index  may  use  0-20  bits, 
activation  name  o-32  bits,  and  iteration  label  0-20  bits. 

The  Manchester  machine  design  incorporates  opcodes  to  manipulate 
labels.  The  Yield  opcodes  accept  an  input  of  any  type  and  return  the 
label  or  individual  label  fields.  The  Extract  opcodes  accept  a  token 
of  type  label  as  input  and  extract  fields  from  the  label.  The  Set 
opcodes  accept  label  or  label  field  tokens  as  input  and  produce  new 
label  tokens. 

The  data  flow  template  produced  by  the  Model  Processor  has  been 
applied  to  the  Manchester  System,  Which  supports  reentrancy .  However, 
in  order  to  be  useful  for  non-reentrant  data  flow  machines,  sufficient 
information  is  stored  in  the  data  flow  template  to  allow  construction 


of  an  EXPAND- like  directive. 


IX. 3. 2  Partitioning  The  Program  Among  Processors 


As  we  have  seen  fro*  Section  1 1. 2. 3,  a  data  flow  machine  consists  of 
one  or  more  processors  communicating  through  an  interconnect.  A 
processor  has  only  local  storage  for  data  flow  programs  and  data 
tokens.  There  is  no  common  memory  for  instructions  or  for  data.  If 
data  produced  by  a  processor  is  needed  by  an  instruction  in  another 
processor,  that  data  must  be  transmitted  along  the  communication  path 
to  the  other  processor.  In  the  TI  OOP,  the  communication  path 
consists  of  a  common  bus.  In  Arvind's  data  flow  machine,  there  is  an 
interconnection  network  to  join  the  processors  in  the  machine.  If 
there  is  no  direct  path  from  one  processor  to  the  second,  the  data  has 
to  be  routed  through  one  or  sore  intermediate  links.  The  Utah  DOM 
interconnect  has  this  property. 

To  minimize  the  cost  of  data  transmission  between  processors  in 
the  machine,  it  is  desirable  to  follow  the  principle  of  locality  of 
reference.  A  partition  of  the  graph  allocated  to  one  processor  should 
contain  a  related  set  of  instructions.  The  instructions  are  related 
in  the  sense  that  data  produced  by  one  instruction  in  the  program 
subgraph  is  used  by  other  instructions  resident  in  the  same  processor. 
Such  a  partition  minimizes  the  number  of  tokens  which  must  be 
transmitted  over  the  interconnect  to  other  processors  in  the  machine. 
The  scheduler,  therefore,  attempts  to  partition  the  data  flow  program 
into  blocks  of  related  instructions  in  Which  each  block  is  an 


independent  unit  of  allocation. 


II. 3. 3  Data  Structures 


The  third  question  to  be  considered  is  that  of  handling  data 
structures.  What  mechanism  does  the  machine  use  to  access  structured 
data  such  as  arrays  and  records?  Arvind's  proposed  machine  requires  a 
"structure  controller"  to  access  structured  data.  The  Manchester 
machine  has  special  opcodes  added  on  to  the  basic  machine  to  handle 
structures.  Since  structured  data  and  the  array  data  type  is  very 
important  to  the  Model  Specification  Language,  understanding  how 
structures  are  implemented  on  data  flow  machines  is  important  in 
translating  Model  structures  to  data  flaw. 

the  use  of  structured  data  poses  difficulties  to  a  "pure"  data 
flew  machine.  Accessing  structured  data  on  a  conventional  sequential 
machine  is,  by  contrast,  much  simpler,  the  base  sequential  machine 
has  a  linearly  addressed  memory  (or  a  hierarchy  thereof),  structured 
data  defined  through  high  level  programming  languages  is  mapped  to  the 
linear  memory.  A  compiler  usually  generates  code  to  calculate  the 
array  offsets.  If  an  offset  is  known  at  compile  time,  the  composite 
program-relative  address  can  be  used  directly.  In  either  case,  after 
the  initial  address  calculation,  data  in  a  structure  is  accessed  and 
stored  just  as  if  it  were  simple  data. 

On  a  data  flow  machine,  accessing  structured  data  is  more 
complicated.  By  way  of  example,  consider  the  reference  to  an  array 
element  A(5)  in  a  computation.  In  the  data  flow  graph,  A(5)  is  input 


to  the  computation.  However,  does  this  mean  that  only  one  element  is 


input  to  the  node,  or  does  it  mean  that  the  entire  array  is  input?  if 
the  former  is  inferred,  then  Where  is  the  rest  of  the  array  A?  There 
is  no  memory  in  the  conventional  sense  in  a  "pure"  data  flow  machine. 
If  the  entire  array  is  input  to  the  node,  serious  efficiency 
consideration  appear .  A  token.  Whether  structure  or  simple  field, 
must  be  duplicated  and  sent  to  every  node  Which  needs  it.  Bach  time 
an  array  element  or  one  field  of  a  record  is  changed,  according  to  the 
strict  semantics  of  data  flow,  the  entire  structure  must  be 
duplicated,  and  a  new  structure,  with  a  new  name,  created.  Even  When 
this  is  done  conceptually  rather  than  literally,  an  implementation 
needs  additional  functional  units  and/or  machine  opcodes  to  support 
structured  data.  The  following  section  describes  some  ways  of 
handling  structures  in  a  data  flow  environment . 

II.  3. 4  Structure  Processing  On  Some  Data  Plow  Machines 

One  approach  Which  has  been  taken  in  some  machine  designs 
(Arvind,  DOM)  is  to  attach  smother  processor  to  the  data  flow  unit,  a 
structure  controller,  to  handle  accessing  and  storing  structured  data, 
the  Utah  DOM  uses  an  intelligent  memory  at  each  Processing  unit  to 
handle  all  data,  structured  or  simple.  The  Atomic  Storage  unit  (ASU) 
provides  a  "location  independent  method  for  dealing  with  am  arbitrary 
structure  of  variable  length  fields"  [Davi79] .  The  elementary  item  of 
storage  is  the  field.  A  field  is  either  a  variable  number  of 
Characters  delimited  by  two  reserved  Characters  ( left  and  right 


parentheses )  or  else  a  sequence  of  any  number  of  fields  enclosed  in 


parentheses.  A  parenthesised  field,  therefore,  represents  a  data 
st rue cure.  It  is  equivalent  to  a  generalized  tree.  A  field  which  has 
subfields  is  called  a  file.  The  first  subfield  of  the  file  serves  as 
a  descriptor  to  the  rest  of  the  file.  The  non-descriptor  fields  of 
the  file  way  be  ordered  or  unordered.  Unordered  fields  must  be 
accessed  by  name.  Ordered  fields  nay  be  accessed  by  name  or  by 
position.  The  ASU  can  execute  such  commands  as  inserting  or  deleting 
fields  and  positioning  to  a  certain  field.  Since  fields  can  be 
modified  at  this  primitive  level,  the  higher  levels  must  ensure  that 
the  modified  structures  are  referred  to  by  new  names  in  the  programs 
using  the  fields.  The  ASU  also  manages  all  memory  allocation  and 
reclamation. 

Dennis  has  proposed  a  Structure  Controller  to  handle  structured 
data  for  his  machine.  Like  the  Utah  ASU,  the  Structure  Controller 
accepts  read  and  write  requests  from  the  data  flow  processor,  and 
manages  the  dynamic  memory  allocation  and  reclamation.  The  Dennis 
Controller  realizes  structures  as  acyclic  binary  trees.  A  node  is 
either  an  elementary  value  or  a  pointer  to  other  nodes.  A  node  is 
addressed  by  its  selector,  which  is  its  position  in  the  tree.  The  two 
basic  operations  on  a  structure  acre  SELECT  and  APPEND.  SELECT,  when 
given  a  structure  name  and  selector,  returns  the  value  at  that 
selector.  APPEND,  given  a  structure,  selector,  and  value,  returns  a 


new  structure  identical  to  the  original  one  except  that  the  new  value 


has  been  inserted  at  the  indicated  selector.  In  order  to  increase 
efficiency,  structures  can  be  shared.  Each  node  has  a  reference 
count.  Which  is  the  total  nustoer  of  pointers  to  that  node.  A  SELECT 
operation  to  a  node  causes  the  node's  reference  count  to  be  increased. 
When  an  APPEND  occurs  at  a  selector,  then  the  node  Which  is  being 
replaced  mist  have  its  reference  count  decremented.  When  an  APPEND 
occurs  and  the  reference  count  is  greater  than  one,  new  pointers  must 
be  created  to  the  unchanged  portion  of  the  structure.  When  the 
reference  count  reaches  zero,  a  cell  may  be  reclaimed  CAcke78]. 

A  variation  to  this  form  of  structure  controller  is  Arvind's 
I-structure  controller.  An  I-structure  is  an  array-like  data 
structure.  The  storage  for  the  structure  is  allocated  before  each 
element  of  the  structure  is  defined.  In  order  to  support  the 
unordered  nature  of  structure  construction,  each  element  has  a 
presence  bit  associated  with  it.  If  an  undefined  element  is 
referenced,  the  read  is  deferred.  Once  the  element  is  defined,  all 
deferred  reads  must  be  honored  [ArviBO] . 

The  benefit  of  having  a  separate  structure  controller  is  in 
eliminating  the  overhead  of  creating  and  propagating  large,  complex, 
and  unwieldy  structures  in  the  same  manner  as  simple  tokens.  The 
drawbacks  are  in  the  introduction  of  sequential  access  and  update  ( for 
links  and  reference  count)  and  the  introduction  of  a  memory  access 
bottleneck  lust  as  in  sequential  machines  [Ga;jB82] . 
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In  contrast  to  the  separate  structure  controller,  the  Manchester 
Machine  has  special  opcodes  executed  by  the  data  flow  machine  itself 
to  support  structured  data.  These  opcodes  nonage  a  data  type  called 
stream.  This  data  type  will  be  discussed  in  further  detail  in  the 
next  chapter.  In  this  section,  a  stream  can  be  considered  equivalent 
to  a  file  or  array. 

In  the  Manchester  machine ,  the  iteration  level  field  of  the  token 
label  is  used  to  differentiate  elements  of  a  stream.  Examples  of 
stream  ops  include  opcodes  to  generate  and  check  the  iteration  level 
field  of  the  token  label;  to  separate  a  stream  into  FIRST  and  REST; 
to  check  for  end-of-stream;  and  to  add  an  element  to  the  head  of  a 
stream.  The  opcodes  Which  set  and  update  the  token's  iteration  level 
perform  much  the  same  function  as  instructions  in  a  conventional 
machine  to  compute  the  address  of  an  array  element.  The  difference  is 
in  What  happens  to  this  label  or  address.  On  a  sequential  machine, 
the  address  is  sent  to  the  memory  to  access  the  data.  On  a  data  flow 
machine,  the  label  is  attached  to  the  token.  The  token  then  is 
matched  with  its  partner  with  the  same  label,  and  the  two  are 
delivered  to  the  appropriate  instruction. 

In  addition,  the  Manchester  data  flow  machine  can  store  an  entire 
stream  in  a  Storage  Node.  Elements  of  the  stored  stream  are  accessed 
in  a  demand-driven  fashion.  Operations  on  stored  streams  include 
maintainance  of  reference  count,  getting  a  pointer  to  the  stream, 
fetching  stream  elements,  and  garbage  collection.  Besides  these  extra 


operations,  there  are  Batching  store  functions  to  support  stored 
streams.  These  functions  include  Increment,  to  increment  a  token  of 
type  ordinal.  Which  is  used  in  a  token  label;  Decrement;  Preserve, 
to  make  a  copy  of  a  token  but  leave  it  in  the  matching  store;  and 
Defer,  to  defer  storing  a  token  in  the  matching  store  in  case  of 
collision  of  labels  [Kirk82] .  The  advantage  of  using  the  stored 
stream  rather  than  the  simple  stream  is  that  a  parameter  to  function 
or  iteration  which  is  of  type  array  may  be  passed  through  a  single 
token  (the  address  of  the  array ) .  The  disadvantage  is  that  the  system 
may  fill  up  with  a  large  number  of  unconsumed  tokens  [Gurd81] .  This 
might  occur  because  the  Defer  matching  function  is  used  to  keep  stored 
stream  tokens  circulating  on  the  ring. 

II. 3. 5  Implications  Of  The  Structured  Data  Problem 

Regardless  of  the  implementation  strategy,  creating  and  accessing 
structured  data  can  be  a  major  source  of  bottleneck  in  a  data  flow 
machine.  Using  a  structure  Controller  to  store  and  retrieve  tokens 
forces  sequential  operation.  It  is  necessary  to  read  and  update 
pointers  and  reference  counts  sequentially  in  order  to  guarantee  their 
validity.  With  the  I-structures ,  Gajski  points  out,  it  is  difficult 
to  know  ahead  of  time  the  optimal  memory  allocation  scheme  to 
partition  large  arrays .  Memory  contention  problems  may  occur  for 
frequently  accessed  elements  stored  in  the  same  memory  module.  Gajski 
observes  that  these  are  the  same  problems  Which  affect  vector  machines 
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and  other  Multiprocessors  [Gajs82] .  Incorporating  stream  handling 
into  the  data  flow  processor  can  result  in  a  large  number  of  tokens 
circulating  through  the  machine. 

In  generating  a  data  flow  template,  therefore,  it  is  beneficial 
to  simplify  the  structure  of  data  used  in  the  specification.  One  way 
to  do  this  is  to  recognize  cases  in  which  data  of  type  array  can  be 
reduced  in  dimension  in  the  generated  program.  When  a  two-dimensional 
array  can  be  reduced  to  a  one-dimensional  array  or  to  a  scalar, 
overhead  in  creating  and  accessing  the  array  is  reduced,  the  Model 
processor  attempts  to  partition  the  program  graph  so  that  array 
dimensions  can  be  minimized  whenever  this  is  consistent  with 
maintaining  parallelism  inherent  in  the  problem.  Doing  so  reduces  the 
complexity  of  the  data  structure  which  the  data  flow  machine  must 
handle. 


II. 4  THE  MANCHESTER  DATA  FLOW  MACHINE 

this  section  describes  the  Manchester  university  data  flow 
machine.  A  prototype  of  the  machine  has  been  operational  since  1981, 
and  a  second  unit  is  under  construction.  An  emulator  for  the 
Manchester  machine  has  been  used  to  run  data  flow  programs  generated 
by  the  Model  system. 


II. 4.1  Machine  Layout 


The  Manchester  machine  consists  of  five  functional  units  connected  in 
a  pipelined  ring.  The  machine  architecture  is  illustrated  in  Pigure 
2.6.  Input  to  and  output  from  the  ring  are  controlled  by  an  LSI-LL, 
the  host.  Tokens  travel  on  the  ring  in  the  following  manner: 

1.  Tokens  enter  the  system  from  the  LSI-LL  through  the  Switch.  The 
Switch  is  the  machine  interface  to  the  LSI-ll.  The  Switch  can  receive 
tokens  either  from  the  Processing  Unit  or  the  LSI-ll.  It  can  route 
tokens  either  to  the  Token  Queue  or  the  LSI-ll. 

2.  After  entering  the  system  through  the  Switch,  tokens  are  stored  in 
the  Token  Queue.  The  Queue  can  provide  temporary  storage  for  up  to 
16K  96-bit  tokens. 

3.  Next  on  the  ring  is  the  Matching  Unit.  This  Unit  gathers  pairs  of 
tokens  with  the  same  destination  node  address  and  label.  The  Unit 
operates  associatively.  When  a  token  arrives  at  the  Unit,  there  is  an 
associative  search  for  a  partner.  One  control  field  of  a  token  is  the 
matching  store  function.  This  function  specifies  what  the  Hatching 
Unit  should  do  both  in  the  case  that  a  partner  can  be  found  and  in  the 
case  that  a  partner  cannot  be  found.  The  most  common  matching  store 


functions  are: 


-  Bypass  (BY),  this  weans  that  the  token  does  not  have  a 
partner.  The  instruction  only  needs  one  token,  or  else  the  other 
token  is  a  literal  carried  with  the  instruction.  The  token 
bypasses  the  Hatching  Unit. 

-  Extract-Wait  ( EW) .  This  is  the  normal  action,  if  the  partner 
is  found,  the  partner  is  extracted  from  the  store  and  joined  with 
the  incoming  token.  The  two  form  a  complete  token  set.  if  the 
partner  is  not  found,  the  token  is  stored  in  the  Matching  Unit  to 
await  the  partner. 

variations  of  these  common  actions  have  been  added  as  matching 
store  functions  so  that  the  machine  has  primitives  with  which  to 
implement  resource  managers  and  to  handle  stored  arrays.  The 
latter  functions  have  been  described  above. 

4.  The  token  or  token  pair  leaving  the  Hatching  Unit  next  addresses 
the  Mode  Store.  This  is  the  unit  in  which  the  data  flow  instructions 
are  stored.  Each  instruction  consists  of  an  opcode  and  destination 
address(e8).  This  information  is  added  to  the  token(s)  to  produce  a 
group  packet. 

5.  The  group  packet  goes  to  the  processing  unit  for  execution. 

6.  The  result  goes  to  the  Switch.  Depending  on  the  address,  it  may 


either  exit  the  machine  or  recirculate  to  the  Token  Queue. 


II. 4. 2  Data  And  Instruction  Formats 

A  token  on  the  Manchester  machine  consists  of  a  96-bit  word. 
This  word  is  divided  into  a  32-bit  data  field,  a  36-bit  label,  and 
19-bit  destination  address,  and  10  bits  for  type  information  and 
control.  The  destination  is  subdivided  into  a  node  address,  the  input 
point  (left  or  right),  and  the  matching  function.  The  label  subfields 
are  discussed  in  Section  II. 3.1.  Every  token  and  literal  has  a  data 
type.  The  standard  types  are  Integer,  positive  integer,  real, 
character ,  and  boolean.  The  ordinal,  activation  name,  and  label  data 
types  are  used  in  labels.  Other  types  include  types  to  create  dynamic 
arcs  (for  function  return,  for  example),  stream  numbers, 
end-of-stream,  trigger,  the  set  data  type,  error,  and  lambda- i,  which 
is  a  number  in  the  range  0  to  2~24-l  indicating  one  of  2" 24  different 
user-defined  types. 

An  instruction  consists  of  an  opcode,  and  a  maximum  of  two 
destinations.  One  of  the  destination  fields  may  be  used  for  a  literal 
input  to  the  instruction.  A  maximum  of  ?1  bits  is  used  for  an 
instruction.  There  is  a  12-bit  opcode,  18  bits  are  available  for  one 
destination,  32  bits  are  available  for  the  second  destination  or  for  a 


literal,  and  9  bits  are  used  for  control. 


Ihe  instruction  set  is  divided  into  various  sets  of  operators. 


The  ordinary  arithmetic  operators  constitute  the  largest  set.  The 
flow  control  operators  include  the  branch  operators.  These  double 
result  operators  send  the  left  input  to  the  left  or  right  output 
depending  on  a  condition  involving  the  right  input.  There  are 
operators  to  manipulate  label  and  destination  fields,  operators  to 
control  dynamic  arcs,  data  structure  handling  operators,  and 
input/output  operators.  The  label-changing  and  data  structure 
operators  have  been  discussed  above.  Dynamic  arc  control  operators 
are  used  for  function  call/ ret urn  sequences.  Input/output  operators 
provide  a  communication  path  to  the  host. 

II. 5  CONCLUSION 

This  concludes  the  survey  of  data  flow  machines.  From  the 
discussion  above,  it  is  clear  that  many  different  approaches  have  been 
taken  to  implementing  a  machine  to  interpret  a  data  flow  graph.  The 
implementations  must  deal  with  issues  far  more  complex  than  simple 
expression  evaluation.  Reentrancy  of  graph,  program  allocation  to 
processors,  and  data  structures  are  major  problems  for  machine 
implementations.  Both  the  proposed  and  prototyped  implementations 
have  trouble  handling  data  structures. 

The  next  chapter  discusses  languages  for  data  flow  machines,  both 
misting  languages  and  languages  designed  specifically  for  data  flow 


machines . 


CHAPTER  III 


LANGUAGES  FOR  DATA  FLOW 

The  development  of  suitable  languages  to  program  data  flow 
machines  has  been  an  area  of  active  research .  Languages  suggested  as 
appropriate  range  from  conventional  programing  languages  [TITE80]  to 
new  languages  designed  especially  for  data  flow  [Denn74]  [Arvi78] 
[Acke79]  [McGrSO] .  In  this  chapter,  we  will  examine  seme  of  these 
programing  languages.  In  addition  to  the  high  level  procedural 
programming  languages,  we  will  consider  nonprocedural  languages  as 
data  flow  languages.  Me  conclude  the  chapter  with  a  description  of 
the  Model  Specification  Language. 

III.l  CONVENTIONAL  PROGRAMMING  LANGUAGES 

Existing  conventional  programming  languages  have  been  used  to 
program  data  flow  machines.  There  are  several  advantages  to  using 
conventional  languages.  One  advantage  is  that  there  is  a  considerable 
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body  of  programs  already  written  is  such  languages  as  FORTRAN. 
Compiling  FORTRAN  to  a  data  flew  machine  language  makes  those  programs 
available  imnediately  to  the  new  machines.  Another  advantage  is  that 
there  is  a  considerable  body  of  programmers  conversant  in  such 
conventional  languages  as  FORTRAN.  Proponents  of  the  use  of 
conventional  languages  for  data  flow  argue  that  programmers  will  find 
it  difficult  to  program  with  the  unfamiliar  syntax  and  functional 
semantics  of  the  new  data  flow  languages.  In  addition  they  claim  that 
a  sufficiently  smart  optimizing  compiler  can  recover  much  of  the 
parallelism  from  a  sequential  program  [Gajs82 ] . 

The  programming  language  for  the  TI  DOP  is  FORTRAN.  FORTRAN  was 
chosen  primarily  for  pragmatic  reasons t  TI  has  a  large  investment  in 
supporting  FORTRAN  on  the  TI  Advanced  Scientific  Computer  (ASC)  and  a 
library  of  scientific  application  oriented  benchmark  programs 
developed  for  the  ASC  could  be  used  on  the  DDP.  The  A SC  FORTRAN 
compiler  was  modified  to  produce  program  graphs  for  the  DDP  [TITE80] . 

Although  existing  software  and  fasdliarity  with  it  are  powerful 
arguments  for  using  conventional  languages  for  data  flew,  there  are 
also  disadvantages.  A  conventional  programming  language  such  as 
FORTRAN  mirrors  almost  exactly  the  processor-memory  paradigm  of 
sequential  machines.  It  does  not  follow  the  functional  semantics  of 
the  data  flow  machines.  Computation  in  FORTRAN  is  not  accomplished 
strictly  by  function  application.  The  tasks  of  assigning  values  to 
variables,  fetching  stored  values  of  variables,  and  updating  values  of 


ability  to  provide  in  the  language  powerful  combining  fonts  for 
synthesizing  functions  into  a  program.  Instead  of  rewriting  the  same 
basic  algorithm  in  differing  contexts,  it  becomes  possible  to  isolate 
functional  components  of  the  algorithm,  and  then  put  those  components 
together  in  different  ways  as  the  particular  situation  demands. 

Languages  such  as  Id,  Val,  the  I MJ  programming  language,  and  Mao 
(KanChester  Data  flow)  have  been  defined  for  various  data  flow 
machines .  these  languages  follow  the  functional  semantics  of  data 
flow  machines,  the  basic  language  form  is  the  expression.  All  of  the 
languages  are  single-assignment.  In  addition,  most  of  them  support 
two  new  constructs:  the  parallel  loop  FORALL  as  distinguished  from 
the  iterative  loop  ITER,  and  the  STREAM  data  type. 

III. 2. 2  Iteration 

Just  as  WHILE  and  REPEAT  loops  in  conventional  languages  allow 
controlled  branching,  the  ITER  construct  in  data  flow  languages  allows 
controlled  reassignment  to  variables  [Ache 8 2 ] .  The  ITER  block  is  used 
to  express  recurrence  relations,  such  as  in  computing  factorial.  The 
ITER  block  consists  of  three  parts.  First  there  is  an  initialize  part 
to  give  iteration  variables  their  initial  values.  Then  there  is  the 
body  of  the  iteration.  Only  in  this  section  may  a  variable  be 
reassigned  a  value.  The  iteration  variables  appear  here  as  target  of 
assignment .  The  new  values  of  all  the  iteration  variables  are 
available  only  at  the  end  of  the  block,  so  that  within  the  block,  the 
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variables  are  essential  parts  of  the  language.  Even  the  most 
sophisticated  compiler  cannot  recover  all  the  parallelism  in  the 
algorithm  Which  has  been  disguised  by  the  sequential  nature  of  the 
programming  language .  in  addition,  the  programmer  is  forced  to 
(over)specify  the  algorithm  in  the  sequential  paradigm  even  if  this  is 
not  the  most  natural  'way  to  describe  the  problem.  The  progranmer  must 
supply  a  linear  sequence  of  instructions.  Parallelism  Which  might 
have  existed  in  one  form  of  problem  description  might  be  completely 
absent  When  the  problem  is  translated  to  FORTRAN. 

III. 2  DATA  PLOW  LANGUAGES 

III. 2.1  Common  Properties 

Even  though  conventional  languages  will  undoubtedly  be  used  on 
data  flow  machines  (for  back  compatibility,  among  other  reasons), 
functional  languages  also  have  been  designed  for  data  flow.  In  this 
type  of  programming  language ,  a  computation  is  accomplished  by 
function  application,  that  is,  by  supplying  parameters  to  a  function 
and  evaluating  the  function.  The  results  produced  by  the  function 
depend  solely  on  the  parameters  supplied.  The  context  or  environment 
in  Which  the  function  was  invoked  has  no  bearing  on  the  results 
produced.  There  are  two  immediate  benefits  when  programing  is 
accomplished  as  function  application.  First,  already  mentioned,  is 
the  simplified  semantics  of  the  programing  language.  Another,  is  the 


single  assignment  rule  is  still  enforced.  The  third  part  of  the  ITER 
is  test  for  termination.  An  example  of  an  ITER  block  is  the  factorial 
program  [Acke82].  In  Id,  the  program  is  written  as  follows: 

(initial  J  Nj  PACT  »»  1»  /*  initialization  */ 
while  J  <»  0  do  /*  termination  condition  */ 

new  J  J  -  1;  /*  controlled  */ 

new  PACT  PACT  *  J;  /*  reassignment  */ 

return  PACT)  /*  value  returned  */ 

The  program  in  KaD  is  almost  identical  to  the  Id  version. 

III. 2. 3  Parallel  Constructs 

The  PORALL  construct  allows  the  programmer  to  construct  a  block 
which  say  have  multiple  independent  incarnations.  The  scope  of 
repetition  say  be  a  set  of  indices,  as  in  VAI»,  or  a  stream  of  values, 
as  in  KaD.  The  ITER  and  FORAIX  represent  two  classes  of  blocks, 
iterative  and  parallel.  There  is  a  direct  correspondence  in  the 
translation  of  the  data  flow  template  to  a  data  flow  language  between 
a  parallel  or  iterative  block  in  the  template  and  a  parallel  or 
iterative  program  block.  The  two  kinds  of  blocks  are  the  fundamental 
units  which  the  Model  data  flow  scheduler  generates  from  the 
nonprocedural  specification. 
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III. 2. 4  Streams 

The  streaa  was  introduced  earlier  by  analogy  to  a  file  or  array. 
Here  we  define  the  data  type  more  formally.  A  stream  is  defined  to  be 
a  possibly  infinite  sequence  of  elements.  The  elements  of  a  stream 
have  a  total  linear  ordering  and  are  not  required  to  exist 
simultaneously  [Arvi78] .  The  stream  data  type  is  useful  in  handling 
I/O  in  data  flow  languages .  A  sequential  file  can  be  thought  of  as  a 
stream  of  elements.  The  stream  data  type  can  also  be  used  to 
implement  arrays.  A  two-dimensional  array  of  integers,  for  example, 
can  be  implemented  as  a  stream  of  a  stream  of  integers.  If  a  stream 
is  a  parameter  to  a  function,  the  function  operates  on  the  input 
stream  as  it  is  read  from  an  input  device  and  produces  am  output 
stream.  Operations  on  streams  include  extracting  the  first  element 
(CAR),  the  rest  of  the  stream  (CDR),  and  constructing  a  stream  out  of 
simple  tokens. 

III. 3  THE  MAD  PROGRAMMING  LANGUAGE 

MatD  is  a  high  level  data  flaw  language  designed  at  Manchester 
University  (Bowes 2] .  it  is  considered  an  interim  effort  toward  a  data 
flow  language  for  the  Manchester  machine.  MaD  is  single  assignment 
and  function-oriented.  The  program  unit  is  similar  in  form  to  a 
Pascal  program.  We  will  describe  here  a  subset  of  the  language  which 
has  been  used  in  translating  the  Model  data  flaw  templates  into  MaD. 


-  51  - 

The  complete  BMP  Is  included  in  Appendix  A. 

In  the  following  BNP  description,  optional  language  elements  are 
bradcetted  by  "["  and  **]".  A  "*"  following  the  right  bracket  denotes 
zero  or  sore  repetitions,  and  a  "+"  following  the  right  bracket 
denotes  one  or  more  repetitions.  A  ” j ”  is  used  to  separate  options  in 
the  expansion  of  the  left  hand  side.  Any  punctuation  mark  used  in  the 
language  itself  is  quoted  (for  example,  Syntactic  elements  are 

put  in  <...>.  Syntactic  elements  ending  in  “id**  usually  denote  an 
identifier. 

III. 3.1  The  Program  Heading 

< programme)  :  <*■  PROGRAM  < program- id > 

(  < parameter list>  ]  < result)  * ; * 

C  <typedeclaration>  ] 

C  <funodefns>  ] 

C  <expression>  ] 

END 

[  < assembly— code  >  ] 

The  program  heading  has  an  optional  parameter  list.  However,  the 
< result >  is  required.  There  may  follow  type  declarations  and  nested 
function  declarations.  The  < expression)  is  the  result  returned  by  the 
program.  Assembly  language  code  may  follow  the  END  statement. 

< parameter list >  i !-  •(•  <parmlist> 

(  <parmlist>  ]*  •)' 

si-  < parmid >  (  <parmid>  ]*  's’ 

C  [STORED]  STREAM  [STREAM]*  ] 


<parmlist) 


< result > 


•[*  «result>  t  ' <result>  1*  ']' 
|  C  [STORED]  STREAM]  <typeid> 


Example:  PROGRAM  FACTORIAL  N i INTEGER ) : STREAM  INTEGER; 

The  < parameter ll3t >  is  a  standard  Pascal  format  parameter  list.  The 
< result>  shews  the  data  types  of  values  returned  by  the  program.  The 
header  may  either  be  a  composite  structure  or  may  refer  to  a 
predefined  type.  The  stream  and  stored  stream  data  types  are 
described  in  Section  II. 3. 4. 

III. 3. 2  Structured  Types  In  KaD 

Structured  types  as  defined  in  KaD  are  slightly  different  from 
those  defined  by  Pascal.  The  stream  data  type  is  used  instead  of 
array  or  file.  Records  cannot  be  declared  as  in  Pascal.  structures 
can  be  defined;  however,  fields  cannot  be  named  individually.  An 
expression  consisting  of  a  composite  structure  can  be  created  as  an 
"informal"  record.  For  example,  the  expression  [5.2,  1*5]  is  a 
composite  structure  consisting  of  a  real  and  an  integer. 

When  the  value  of  a  structured  variable  is  defined,  the  entire 
structure  must  be  defined  by  one  expression.  Individual  structure 
elements  may  not  be  defined  separately.  For  example,  the  statement 
A(5]  t-  10  is  invalid  in  MaD  because  a  specific  element  of  A  is  being 


defined.  In  addition,  structured  variables  must  be  defined  strictly 


X  i-  FUNC(K); 
RETURN  ALL  X; 
RETURN  ALL  X2; 


In  this  example  rxSET  and  JXSET  axe  assumed  to  be  a  stream  of  Integer 
indices  from  1  to  the  range  of  the  first  dimension  of  XI  for  I,  and 


from  l  to  the  range 

of  the  second  dim 

ension  of 

XI 

for 

J.  Each 

instance  of  the 

inner  loop  defines 

one 

row 

of 

the 

matrix  XI. 

Individual  elements 

of  a  structured  variable 

such 

as 

XI 

cannot  be 

defined  separately.  When  a  structured  variable  is  referenced  in  an 
expression,  however,  individual  components  may  be  selected.  other 
examples  of  definition  and  usage  of  arrays  are  given  below. 


III. 3. 3  The  Expression 


<expression>  : DECLARE  <block>  | 

IP  <condexp>  | 

<basicexp>  | 

CASE  <casexp> 

The  < expression >  can  be  one  of  four  constructs:  a  DECLARE  block, 
an  IP  expression,  a  basic  expression,  or  a  CASE  expression.  The  first 
two  are  described  here.  The  basic  expression  is  defined  in  the  next 
section.  The  case  expression  is  not  used  in  translating  templates  to 
MaD.  It  is  described  in  the  complete  MaD  BNP  in  Appendix  A. 

: <id>  t  <id>  ]*  : 

<typedefn>  • ;*  <legalblock> 


<blocfc> 


<legalblock> 


•  * 


t <id>  [  ' , '  <id>  ]* 
<typedefn>  * ;  *  J 
< let  >  | 

<initforwhile> 


Local  variables  May  be  declared  in  a  block,  scope  rules  for  variable 
declaration  sure  the  same  as  for  conventional  block  structured 
languages.  Following  the  local  variable  declarations,  the  block  may 
contain  either  a  LET  statement  containing  one  or  more  assignments  or  a 
looping  construct. 

<let>  u-  LET  C  <Uhs>  t-  < expression)  ]+ 

RETURN  < expression) 

< lhs >  ti-  <id>  J  <lhs)  C  <lhs>  1*  *1* 

In  a  LET  statement,  a  previously  declared  variable  nay  be  assigned  a 
value.  Each  variable  may  be  defined  only  once  in  the  program,  since 
the  block  must  return  a  value,  the  LET  must  be  terminated  by  an 
< expression)  to  be  returned.  The  value  of  the  block  is  the  value  of 
this  expression. 

Examplei 

LET  XSTREAM 
DECLARE 

XI,  AI,  81:  INTEGER; 

FOR  EACH  AI  IN  A;  EACH  BI  IN  B  DO 
XI  i-M  +  BI; 

RETURN  ALL  XI; 


In  this  example,  XSTREAM  is  assumed  to  be  of  type  STREAM  INTEGER.  The 


value  of  XSTREAM  is  computed  by  the  above  DECLARE  block..  This  DECLARE 


block  contains  a  structured  loop. 


<initforwhile>  : [INIT  [  <lhs>  • •  < expression)  ]+  ] 

FOR  EACH  <id>  IN  <streamid> 

C  EACH  <id>  IN  <streamid>  ]* 

DO 

[WHILE  <expression>  DO] 

(  [NEW]  <lhs>  ‘ :='  < expression)  ]+ 

RETORN  < expression) 


A  structured  loop  consists  of  three  parts.  The  optional 
initialise  part  assigns  initial  values  to  variables  declared  locally 
in  the  block.  The  FOR  EACH  identifies  streams  Whose  elements  are 
selected  within  the  loop.  The  optional  WHILE  part  sets  up  termination 
conditions  for  the  loop.  If  the  WHILE  is  not  used,  end— of— stream 
signals  loop  termination.  Next  in  the  loop  come  the  assignment 
statements.  In  this  section,  the  loop  variables  may  be  reassigned. 
The  NEW  qualifier  indicates  reassignment .  The  final  part  of  the  loop 
is  the  RETURN  statement.  The  value  of  the  block  is  the  value  of  the 
RETURN  expression. 


ESOBkt 

S  i- 

PBCIABE  I:  INTEGER;  St  TS; 

[*  TS  is  a  user-defined  type  *] 
M  s  *-  <<PA  +  F[N+1]  )  /  2 ); 

I  1> 

WHILE  I  <-  SIZES  -  1  fiQ 
NEW  S  s-  S  +  F( 1+1] { 

NEW  I  I  -t-  1; 

RETURN  S; 


III. 3. 4  The  Basic  Expression 

<basicexp>  : < all— remainder >  j 

<siapleexp>  [  <relop>  <simpleexp>  ] 

A  basic  expression  in  MaD  is  either  an  expression  using  the 
stream  constructs  ALL  or  REMAINDER  or  a  more  conventional  arithmetic 
expression . 

< all-remainder >  : i-  ALL  <basicexp>  [BUT  <basicexp>  ]  | 

REMAINDER  [  <id>  ] 

ALL  constructs  a  stream  by  concatenating  together  each  instance 
of  <basicexp>  in  the  loop.  The  BUT  qualifier  allows  some  of  the 
stream  me  steers  to  be  filtered  out.  *Rie  REMAINDER  <id>  must  be  a 
stream  identifier  referenced  in  a  FDR  EACH.  Examples  of  these 
expressions  occur  in  the  FDR  EACH  loop  above. 

A  simple  expression  is  a  standard  arithmetic  expression  augmented 
with  a  couple  of  unusual  features.  The  operators  •  +  *,  and,  OR, 

MAX,  and  MIN  can  be  used  as  reduction  operators.  Similar  reduction 
operators  are  used  in  Model.  An  "informal"  record  may  be  constructed 
in  MaD  with  the  left  and  right  bracket  notation  described  above. 
Operations  may  be  performed  with  portions  of  a  stream  by  specifying 
high  and  low  bounds  for  the  stream  subset. 


In  addition  to  user-defined  functions,  the  language  provides 
standard  mathematical  functions  such  as  SIN,  COS,  LN,  SQRT,  and 
exponentiation.  Stream  functions  include  the  CONS  to  construct  a 
stream,  FIRST,  REST,  GET  (for  a  stored  stream),  EMPTY,  and  SIZE. 

Example: 

IF  EMPTY(  XSTREAM)  THEN  1  ELSE  1  +  FIR5T(  XSTREAM )/( ( 1  + 

panc(  REST(  XSTREAM)  ) 

In  this  basic  expression,  XSTREAM  is  assumed  to  be  a  numeric  stream. 
FUNC  is  an  arbitrary  function  whose  call  parameter  is  a  numeric 
stream.  The  standard  stream  functions  EMPTY  and  FIRST  are  used  in  the 
expression. 

III. 4  NONPROCEDURAL  LANGUAGES 

III. 4.1  Cannon  Properties 

Nonprocedural  languages  have  been  suggested  as  being  particularly 
well  suited  to  data  flow.  The  Model  nonprocedural  language  has  only 
two  statement  forms,  data  description  and  assertion.  The  assertions 
describe  output  data  in  terms  of  input  data.  The  term  "assignment", 
whether  single  or  multiple,  is  not  applicable  to  nonprocedural 
languages.  Data  cam  not  be  assigned.  Instead,  using  the  assertions, 
one  can  describe  properties  of  the  data  items.  The  independent  data 
item,  on  the  left  hand  side  of  the  assertion,  is  described  as  a 
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function  of  one  or  more  dependent  data  items.  This  defining  function 
is  on  the  right  >tand  side  of  the  assertion.  The  use  of  these  two 
simple  statement  forms  allows  a  very  high  level  form  of  problem 
description . 

Environment  or  context  is  also  missing  in  a  nonprocedural 
language.  The  validity  of  the  assertions  is  not  time  dependent.  It 
is  not  tied  to  a  state  of  the  machine.  In  fact,  a  nonprocedural 
language  has  no  sequencing  or  control  constructs.  Data  dependencies 
alone  control  the  sequence  of  execution.  There  might  be  several 
linear  execution  sequences  of  a  specification  which  are  correct.  Prom 
the  discussion  of  the  data  flow  model  in  Section  II.l,  it  is  clear 
that  specification  languages  follow  exactly  the  semantics  of  the 
model.  The  availability  of  data  drives  program  execution.  Since 
there  is  no  programmer-controlled  sequencing,  a  data  flow  graph  can  be 
constructed  directly  from  the  specification. 

Nonprocedural  languages  have  the  sane  functional  semantics  as  the 
applicative  data  flow  languages  described  above.  However,  a  language 
such  as  Model  is  at  a  higher  level  than  the  data  flow  languages.  A 
programmer  using  a  data  flow  language  must  recognize  iterative  and 
parallel  aspects  to  the  algorithm  and  must  explicitly  construct 
iterative  and  parallel  blocks  in  the  program.  In  contrast ,  a 
porgrammer  using  Model  need  only  specify  operations  on  data  structures 
and  arrays.  The  Model  Processor  detects  iteration  and  parallel 
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invocation  from  the  subscript  expressions  used  in  array  reference.  It 


then  generates  the  appropriate  kind  of  block  in  the  data  flow 
template. 

111. 4.2  LUCID 

LUCID  is  a  nonprocedural  language  invented  by  Ashcroft  and  Wadge 
[Ashc77] .  the  design  of  LUCID  was  motivated  by  the  desire  to  combine 
program  writing  and  program  proving  into  a  single  language.  The 
assertions  in  LUCID  are  the  axioms  from  which  properties  of  the 
program  are  derived.  LUCID  has  sequencing  control  operators  such  as 
FIRST  X  and  NEXT  X  to  specifiy  values  occurring  during  an  iterative 
computation.  The  AS  SOON  AS  operator  can  be  used  to  extract  values 
from  a  loop.  A  data  item  in  LUCID  can  be  thought  of  as  an  infinite 
sequence.  The  elements  of  the  sequence  form  the  history  (or  "world 
line”)  of  the  data  item  in  an  iterative  computation.  This  is 
equivalent  to  the  STREAM  type  in  data  flow  languages . 

111. 4. 3  MODEL 

The  MODEL  nonprocedural  language  has  many  concepts  in  common  with 
LUCID.  The  "world  line”  analogue  in  MODEL  is  one  dimension  of  an 
array.  The  elements  of  the  array  represent  successive  values  of  a 
data  item  during  an  iterative  computation.  This  notation  is  more 
powerful  than  that  of  LUCID  because  data  element  values  other  than  the 
current,  next,  and  first  can  be  referenced  by  using  the  appropriate 


subscript  expression.  The  notation  facilitates  ease  of  reference  to 
multi-dimensional  structures.  The  sane  array  notation  can  be  used. 
In  LUCID,  multi-dimensional  structures  trust  be  constructed  with  nested 
loops. 

Iterative  loop  termination  can  be  expressed  in  MODEL  in  several 
ways:  the  array  can  be  given  a  constant  upper  bound;  the  SIZE 
attribute  of  the  array  can  be  defined  in  an  assertion;  or  the 
END. array  condition  cam  be  defined  in  an  assertion.  The  use  of  these 
attributes  is  described  in  Section  £11.5.4. 

The  objectives  of  MODEL  are  different  from  those  of  LUCID.  MODEL 
has  been  designed  as  a  software  aid  tool  to  automate  program 
development .  The  system  does  extensive  consistency  and  completeness 
analysis  of  a  specification.  The  sequential  Model  Processor  generates 
a  PL/1  program  from  the  specification  [LuKa82] .  The  aim  of  the  MODEL 
program  generation  system  is  to  provide  a  language  for  very  high  level 
specification  of  a  problem  by  non-programsers .  The  system  has  been 
designed  to  resolve  ambiguities  or  errors  in  the  specification  and 
report  corrections  to  the  user  [Shas78] .  Since  the  worX  reported  here 
has  been  done  with  the  Model  System,  we  describe  a  subset  of  the  Model 
Specification  Language  in  greater  detail  in  the  following  section. 


III. 5  THE  MODEL  SPECIFICATION  LANGUAGE 


A  specification  in  Model  consists  of  a  program  header  followed  toy 
data  declarations  and  assertions.  The  header  contains  the 
specification  name  and  the  names  of  the  input  parameters  to  and  output 
results  from  the  specification.  The  input  and  output  parameters  must 
be  of  type  file.  The  following  is  an  example  of  a  program  header: 

MODULE:  MATRIXMUL; 

SOURCE:  INFILE; 

TARGET:  OUTFILE; 

This  header  indicates  that  the  specification  name  is  MATRIXMUL  and 
that  there  is  one  input  file.  INFILE,  and  one  output  file,  OUTFILE. 
The  files  are  assumed  to  be  sequential.  There  must  be  a  target  file 
declaration:  the  specification  must  have  output.  The  data  structures 
of  the  files  are  given  in  the  data  declaration  statements. 

III. 5.1  Data  Declaration 

Model  supports  structured  data  in  the  syntax  style  of  P VI*  The 
data  declaration  statement  defines  the  structures  of  the  input  and 
output  files  and  of  any  interim  data  used.  Levels  of  structure 
include  FILE,  RECORD,  GROUP,  and  FIELD.  The  RECORD  is  the  unit  of 
input/output.  The  FIELD  is  the  smallest  unit  of  data.  The  field  must 


be  of  elementary  data  type.  Some  representative  data  declaration 


statements  follow.  Mote  that  each  statement  is  independent  of  any 
other.  The  order  is  not  significant  to  the  Processor.  The  order  of 
statements  and  indentation  used  merely  enhances  readability . 


INPILE  IS  PILE  ( INREC); 

INREC  IS  RECORD  ( INI, IN2); 
IM1  IS  GROUP  ( IN12( 10 ) ); 
IN12  IS  GROUP  (A(10)); 
A  IS  FIELD  (NUMERIC); 

IN2  IS  GROUP  ( IN21( 10 ) ); 
IN21  IS  GROUP  ( B( 10 ) ); 

B  IS  PIELD  (NUMERIC); 

OUTPILE  IS  PILE  (OUTREC); 
OUTREC  IS  RECORD  (OUT1); 
OUT1  IS  GROUP  (OUT2(10)); 
0UT2  IS  GROUP  ( C( 10 ) ) ; 
C  IS  FIELD  (NUMERIC); 

XD  IS  GROUP  (Xl(lO)); 

XI  IS  GROUP  ( X2( lO ) ) ; 

X2  IS  GROUP  (X(10)); 

X  IS  FIELD  (NUMERIC); 


These  declarations  indicate  that  the  input  file  consists  of  one 
instance  of  one  record,  INREC.  INRBC  contains  two  elements,  INI  and 
IM2,  which  axe  also  aggregates.  A  and  B  are  the  leaf  nodes  of  this 
tree  structure.  They  are  fields.  INREC  therefore  consists  of  two 
matrices .  The  output  file  is  similarly  defined.  OUTREC  is  also  a 


matrix.  XD  illustrates  the  declaration  of  an  interim  structure. 


III. 5. 2  Array  Data 


Array  data  has  two  major  uses  in  Model.  In  addition  to  the 
conventional  usage  of  array  in  the  mathematical  sense  of  vector  or 
matrix,  the  array  is  also  used  as  an  indication  of  repetition. 
Iterative  computation  can  be  expressed  in  Model  using  the  same  array 
notation  as  for  matrix  manipulation .  Examples  of  the  use  of  arrays  in 
iteration  are  given  in  the  next  section.  Here,  we  describe  several 
special  constructs  which  support  use  of  arrays. 

III. 5. 3  Subscript  Data  Type 

The  SUBSCRIPT  data  type  is  a  unique  feature  of  Model  which  helps 
describe  the  index  of  an  element  in  an  array.  A  variable  declared  as 
type  SUBSCRIPT  soy  assume  all  values  in  a  range  from  1  to  some  upper 
bound.  Por  example,  the  declaration 
I  IS  SUBSCRIPT  (lOO); 

defines  I  to  have  all  values  in  a  range  from  1  to  100.  When  I  is  used 
to  qualify  an  array  variable,  as  in  the  reference  A(I),  the  entity 
A( I )  denotes  all  100  elements  of  A.  in  contrast,  if  X  is  defined  by 
the  statement 
X  IS  FIELD  NUMERIC) 

then  the  reference  A(X)  denotes  only  one  element  of  A-  the  element 
whose  index  corresponds  to  the  value  of  X. 


Model  distinguishes  between  global  and  local  subscripts.  The 
explicit  subscript  declarations  define  global  subscripts.  Local 
subscripts  are  of  the  form  SUBn,  where  n  is  a  positive  integer.  They 
are  predefined  to  the  Model  Processor.  The  subscript  is  local  to  an 
assertion.  The  name  may  be  reused  in  other  assertions  to  denote 
different  ranges. 

III. 5.4  Range  Arrays 

The  number  of  elements  in  one  dimension  of  an  array  is  called  the 
range  of  that  dimension  of  the  array.  An  array  dimension  may  have  a 
constant  range  definition  or  either  of  two  system-defined  auxiliary 
arrays  to  define  the  range.  They  are  called  range  arrays. 

If  A  is  an  n-dimensional  array,  then  SIZE. A  is  a  k-dimensional 
integer  array,  where  0<-k<n.  SIZE. A  defines  the  range  of  A*s 
rightmost  (least  significant)  dimension.  The  range  of  each  dimension 
of  SIZE. A  is  equal  to  the  range  of  the  corresponding  dimension  of  A. 
However,  SIZE. A  must  have  fewer  dimensions  than  A.  For  instance,  if  A 
is  a  vector,  then  SIZE. A  is  a  scalar.  If  the  value  of  SIZE. A  is  IS, 
this  implies  that  A  has  fifteen  elements.  The  SIZE  array  may  be 
defined  in  an  assertion  to  establish  the  range  of  the  last  dimension 
of  the  corresponding  array.  The  value  of  the  SIZE. A  array  cannot 
depend  on  a  subscript  j  of  A  where  Figure  3.1  shows  the 
correspondence  between  a  two-dimensional  array  and  its  SIZE  array. 


FIGURE  a.fi  R  S-OIHENSIONAU  ARRAY  AND  ITS  END  ARRAY 


lit .5.5  Assertion 


the  assertion  is  used  to  define  the  values  of  variables  declared 
in  the  specification.  An  assertion  in  Model  is  of  the  form 
t target  variable >  -  < express ion > 

The  target  variable,  or  dependent  variable,  nay  be  a  scalar  or  nay  be 
subscripted.  If  the  variable  is  subscripted,  the  forn  of  each 
subscript  expression  nay  not  be  an  arbitrary  expression.  It  must  be 
of  the  forn  <global  subscript  nane>  or  < local  subscript  nane> .  this 
restriction  exists  because  the  target  language  KaD  does  not  support 
the  structure  operations  SELECT  and  APPEND.  Instead,  a  structure  must 
be  completely  defined  by  one  statement.  Use  of  an  arbitrary  subscript 
expression  implies  selective  definition  of  one  array  element  rather 
than  definition  of  each  element.  The  expression  on  the  right  hand 
side  contains  the  source,  or  independent,  variables .  The  expression 
nay  be  an  arithmetic  or  conditional  expression.  Subscripts  on  the 
right  hand  side  nay  contain  arbitrary  subscript  expressions. 

A  conditional  expression  is  of  the  form 
IP  < boo lean  expression*  THEN  < expression* 

ELSE  ‘expression* 

The  following  aura  examples  of  Model  assertions: 

X(I,J,K)  -  A(  I,K)  *  B(K,J); 

S(  I)  -  IP  I  -  1  THEN  (PA  +  P(N>)/2; 

ELSE  S(I-l)  +  P(I-l); 


END.J(SUBI,SUBJ)~  IP  J(  SUBI,  SUBJ)>-  IFILE.N  THEN  TRUE; 

ELSE  FALSE; 

In  the  first  assertion  we  assume  that  I,  J,  K  are  of  data  type 
SUBSCRIPT.  With  this  assumption  in  mind,  we  can  see  how  the  assertion 
illustrates  the  power  of  the  specification  language.  In  a 
conventional  high  level  programming  language,  the  statement  would  have 
to  toe  enclosed  in  a  three  level  loop.  In  Model,  use  of  the  subscript 
data  type  permits  the  programmer  to  specify  operations  on  a 
three-dimensional  array  with  a  single  assertion.  The  second  assertion 
illustrates  the  use  of  recurrence  relations,  in  this  example,  the 
THEN  clause  establishes  the  initial  condition  because  it  defines  the 
value  of  S  for  the  first  iteration.  The  ELSE  clause  defines  the 
recurrence  relation.  The  termination  condition  for  the  recurrence 
could  be  established  in  any  of  the  ways  discussed  above:  with  range 
arrays  or  through  the  range  of  the  subscript  I.  The  third  assertion 
illustrates  use  of  the  range  array  END.  Note  that  the  value  of  END. J 
depends  on  data  available  only  during  program  execution. 

III. 6  THE  EXAMPLE  SPECIFICATION 

The  specification  shown  in  Figure  3.3  below  demostrates  some  of 
the  constructs  in  the  Model  Language.  This  specification  is  also  used 
as  an  example  in  the  following  chapters. 


MDOULE:  EXAMPLE; 


SOURCE:  INFILE1,  INFILE 2; 
TARGET:  OUTFILE; 

INF I LEI  IS  FILE  ( INRECI( LOO ) ) ; 
INRBC1  IS  RECORD  (A); 

A  IS  FIELD  (NUMERIC); 
INFILE2  IS  FILE  ( INREC2( IOO ) ); 
INREC2  IS  RECORD  (B); 

B  IS  FIELD  (NUMERIC); 

OUTFILE  IS  FILE  ( OUTREC( IOO ) ) ; 
OUTRBC  IS  RECORD  (C); 

C  IS  FIELD  (NUMERIC); 

XD  IS  GROUP  (X(  IOO)); 

X  IS  FIELD  (NUMERIC); 

I  IS  SUBSCRIPT  ( IOO); 

/*  Assertion  1:  AASS220  */ 
X(I)  -  A(I)  +  B(  I); 

/*  Assertion  2:  AASS230  */ 

C(  I )  -  X(  I  )  *  X(  I  ); 

END  EXAMPLE; 


Figure  3.3  The  EXAMPLE  Specification 


A  BMP  description  of  Model  is  included  in  Appendix  B. 

III.  7  CONCLUSION 

This  concludes  the  discussion  of  languages  for  data  flow.  We 
have  described  novel  features  of  data  flow  languages.  Data  flow 
languages  distinguish  between  iterative  and  parallel  loops  and  have  a 
new  data  type,  the  stream,  which  is  used  for  I/O  and  for 


CHAPTER  tV 


THE  MODEL  SYSTEM 

Thi3  chapter  describes  the  array  graph  and  other  data  structures 
of  the  Model  System  which  are  used  in  scheduling  and  code  generation 
CLuKaSZ]  (PrywSZa].  The  first  step  in  generating  a  high  level 
language  program  from  a  specification  is  syntax  analysis.  The  Model 
System  reads  the  specification  and  checks  the  syntax.  If  any  errors 
are  found,  they  are  reported,  and  the  system  halts.  If  the 
specification  is  verified  to  be  syntactically  correct,  it  is  then 
checked  for  semantic  correctness,  completeness,  and  consistency.  In 
many  cases,  incompleteness  and  ambiguity  in  the  specification  are 
corrected,  and  warnings  reported  to  the  user  [Locks 2] .  if  it  is  not 
possible  for  the  system  to  correct  the  specification,  error  messages 
are  issued,  and  further  analysis  is  curtailed.  If  the  specification 
passes  these  stages,  the  system  makes  an  internal  representation  of 
each  entitity  in  the  specification,  and  builds  the  array  graph.  The 
array  graph  is  the  major  input  to  the  scheduler. 
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The  Model  System  Implementation  is  described  in  [Luka&l].  In 
addition,  in  [Shas78]  may  be  found  description  of  a  more  powerful 
general-purpose  programming  variant  of  Model.  [Gree81]  describes  a 
Model  variant  Which  can  generate  a  PL/ 1  program  to  solve  a  set  of 
simultaneous  equations.  [Pryw82b]  is  a  report  on  using  Model  for 
cooperative  computation  in  a  distributed  computing  environment. 

The  next  section  describes  the  underlying  graph  of  a 
specification.  Next,  we  describe  the  array  graph,  node 
dimensionality,  and  the  precedence  relations  which  determine  the  edges 
of  the  graph.  The  concepts  of  range  and  range  set  are  introduced 
next.  The  final  section  defines  the  storage  allocation  attributes  of 
physical  and  virtual  storage. 

IV.  i  THE  UNDERLYING  GRAPH 

A  natural  way  to  represent  the  specification  for  analysis  is  in 
the  form  of  a  graph.  There  is  a  node  in  the  graph  for  each  entity  in 
the  specification.  Edges  are  inserted  into  the  graph  to  indicate 
precedence  relationships  between  the  nodes.  To  be  an  accurate 
representation  of  the  specification,  the  graph  should  contain  a  node 
for  each  array  element  and  a  node  for  each  instance  of  an  assertion. 
For  example,  if  variables  A  and  B  are  assumed  to  have  10  elements 
each,  then  the  assertion 


A<I)  -  B(  I  )*5 


has  10  instances,  one  for  each  value  of  I  from  1  to  10.  Tt:is  graph  is 


called  the  underlying  graph  of  the  specification.  If  such  a  graph 
were  constructed,  it  would  certainly  be  very  large .  in  addition,  in 
many  cases  the  graph  could  not  be  constructed  because  the  number  of 
elements  in  an  array  might  not  be  determined  until  run  time.  Thus  the 
system  would  not  know  at  compilation  time  how  marry  nodes  to  create. 
Pox  these  reasons,  a  compact  form  of  the  underlying  graph  is 
constructed  instead  of  the  underlying  graph  itself.  This  graph  is 
called  the  array  graph.  It  is  so  named  because  each  node  and  each 
edge  may  be  multi-dimensional.  A  node  represents  an  entire  array  (of 
zero  or  more  dimensions)  and  an  edge  represents  the  corresponding 
array  of  relationships  between  nodes. 

There  is  a  node  in  the  array  graph  for  each  data  item  declared  in 
the  specification  and  each  assertion.  In  addition,  nodes  are  created 
for  range  arrays  if  these  are  used  in  assertions.  Such  basic 
information  as  the  node  type  (file,  record,  group,  field,  assertion, 
etc. ),  node  dimensionality,  and  node  predecessors  and  successors  is 
stored  with  the  node. 

IV.  2  DATA  STRUCTURES  DEFINING  THE  ARRAY  GRAPH 

The  following  description  of  the  array  graph  is  adapted  from 
(Pryw82a).  The  array  graph  is  represented  by  three  data  structures: 


-  DICT.  A  dictionary  of  nodes.  There  is  a  NODE  entry  in  the 


dictionary  £or  each  assertion  and  each  data  ite«. 

-  node.  All  the  attributes  of  interest  for  an  assertion  or  data 
item  are  stored  in  the  NODE  data  structure.  Pigure  4.1  lists  the 
NODE  attributes.  Figure  4.2  shows  the  data  structure  of  a  local 
subscript  list  entry  of  a  node.  The  local  subscript  list  is  a 
description  of  the  node  dimensions. 

-  EDGE.  Information  about  the  relationships  between  nodes  is 
stored  in  the  EDGE  data  structure.  In  the  EDGE  are  shown  the 
source  and  target  of  the  edge,  and  the  subscript  expressions  used 
in  the  edge.  Pigure  4.3  shows  the  data  structure  of  the  EDGE, 
and  Pigure  4.4,  the  data  structure  of  a  subscript  expression. 
Pigure  4.5  shows  the  nodes  and  edges  in  the  array  graph  for  the 
specification  EXAMPLE  (Pigure  3.3). 


-  NOOR_ID.  node  number  and  Naae 

-  MODE_TYPE .  Node  type  ( data  or  assertion) 

If  the  NODE_TYPE  is  data,  the  NODE  contains  the  following  attributes: 

-  PARENT:  the  array  graph  node  number  of  the  parent  of  this 
node. 

-  9SONS:  the  number  of  immediate  descendants  of  thi3  node. 

-  SONl:  the  node  number  of  the  leftmost  descendant. 

-  BROTHER! :  the  node  number  of  the  sibling  to  the  immediate 
right  of  this  node. 

-  REPEATING:  Whether  this  node  is  repeating  or  scalar. 

-  SUBSLST.  The  local  subscript  list,  a  list  of  node  subscripts 
associated  with  the  node. 

-  PRED_I<ST.  The  Predecessor  Edge  list. 

-  SIXXLIiST.  The  Successor  Edge  list. 

Pigure  4.1  The  NODE  Data  Structure 


-  REDUCED.  Whether  the  dimension  is  reduced  in  that 
subscript.  Only  applicable  to  assertion  nodes. 

-  STOTYP .  Whether  the  dimension  is  physical  or  virtual.  Used  only  Cor 
data  nodes. 

-  RANGE,  the  range  set  number  of  the  range  of  this  dimension. 

-  RALP.  the  real  arguments  to  a  range  array. 

-  IDWITH.  the  nesting  level.  Determined  in  scheduling  and 
used  in  code  generation. 

Figure  4.2  A  Node  Subscript 


-  SOURCE.  The  node  number  of  the  predecessor  node. 

-  TARGET.  The  node  number  of  the  successor  node. 

-  DIMDIF.  The  difference  in  dimensionality  between  the  source  and 
target. 

-  SUBX.  The  subscript  expressions  used  in  the  edge,  ordered 
by  position  in  the  predecessor  node. 

Figure  4.3  The  EDGE  Data  Structure 

-  UCNLSUM.  Local  subscript  position  in  the  successor  node. 

-  APR_MDDE.  Subscript  expression  type 
("I",  "I-k"  (where  k>-l),  other). 

-  I_K.  The  "It",  if  the  APRJODE  is 

Figure  4.4  Subscript  Expression 
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IV.  3  moe  DIMENSIONALITY 


Each  node  in  the  axray  graph  has  zero  or  more  dimensions.  Por 
example,  x  in  the  declaration 
X  IS  PIEIO  (NUMERIC); 

has  a  dimensionality  of  zero,  and  V  in  the  declaration 
VO  IS  GROUP  V( 10); 

has  a  dimensionality  of  one.  The  dimensionality  of  am  assertion  node 
is  set  to  the  union  of  the  dimensionality  of  all  the  data  nodes  in  the 
assertion.  After  a  data  node's  dimensionality  has  been  determined 
from  the  data  declaration  statement,  the  data  usage  must  be  checked  in 
the  assertions  to  ensure  consistency  of  dimension.  In  one  dialect  of 
Model,  variables  may  be  used  without  subscript  or  with  an  incomplete 
subscript  list  in  assertions,  the  actual  number  of  dimensions  is  then 
constructed  from  the  variable's  data  declaration,  and  the  node 
dimension  fields  of  the  variable  and  the  assertions  which  use  it  are 
augmented  appropriately  [LuKa82] .  In  Figure  4.1,  the  node 
dimensionality  determines  the  number  of  node  subscripts  in  the  local 
subscript  list.  Each  node  subscript  contains  a  further  description  of 
a  dimension. 

the  REDUCED  field  in  a  node  subscript  is  used  only  in  an 
assertion  node.  The  assertion's  node  subscript  list  is  the  union  of 
the  subscript  lists  of  source  variables  and  of  the  target  variable. 
If  a  subscript  appears  of  the  right  hand  side  of  the  assertion  but  not 


on  the  left  hand  side,  then  we  say  that  the  subscript  has  been 


reduced,  and  the  REDUCED  field  for  that  subscript  for  the  assertion 


node  is  set  to  1. 

The  IDWITH  field  is  determined  in  scheduling.  Each  node 
dimension  is  identified  with  a  block  level  when  the  node  dimension  is 
scheduled.  The  block  level  number  is  stored  in  IDWITH. 

RANGE  and  RALP  hold  information  about  the  range  of  the  subscript. 
Definitions  of  range  and  of  a  range  set  are  given  below.  The  field 
STOTYP  is  used  only  for  data  nodes.  It  records  how  the  node  dimension 
is  represented  in  the  generated  program.  During  scheduling,  each  node 
dimension  is  determined  to  be  either  physical  or  virtual.  If  a  node 
dimension  is  physical,  the  STDTYP  field  is  negative.  If  the  dimension 
is  virtual,  the  STOTYP  field  is  positive.  The  absolute  value  of  the 
S1DTYP  indicates  the  amount  of  storage  required  by  the  data  node  in 
the  generated  program.  The  terms  physical  and  virtual  are  defined 
below. 


IV.  4  PRECEDENCE  RELATIONSHIPS 

The  predecessor  and  successor  lists  of  a  node  in  the  dictionary 
define  the  precedence  relationships  between  a  node  and  other  nodes  in 
the  specification.  If  a  node  Ni  is  on  the  predecessor  list  of  a  node 
NO,  we  say  that  there  is  an  edge  from  Ni  to  NO.  Similarly,  if  Nj  is 


on  the  successor  list  of  NO,  then  there  is  an  edge  from  NO  to  Nj. 


An  edge  from  a  node  MO  to  a  node  Mj  has  the  following  Coze: 

t 

Nj(Ul,  ...  ,0k)  < -  M0(J1,  ...  ,Jte) 

Where  t  is  the  edge  type,  k  is  the  dimensionality  of  Mi  and  a  is  the 
dimensionality  of  NO.  The  subscript  expressions  01,02,  ...  ,  Ok  are 
stored  with  the  dictionary  entry  for  Nj.  Jl,  J2,  ...  , Jm  are  the 
subscript  expressions  associated  with  dimensions  1,  2,  . . .  ,m  of  NO. 
For  example  in  the  assertion. 

Assertion:  A(I)  -  B(I-2)  +  B(Z-l); 

there  is  a  subscript  expression  "1-2"  associated  with  the  first 
dimension  of  an  edge  from  B  to  Assertion  and  a  subscript  expression 
"I— 1"  associated  with  the  first  dimension  of  a  second  edge  from  B  to 
Assertion. 

There  are  three  broad  categories  of  precedence  relationships, 
data  dependency,  hierarchical,  and  data  parameter.  Data  dependency 
means  that  data  must  be  generated  before  it  can  be  used.  Therefore, 
for  each  assertion,  source  variables  appear  on  the  predecessor  list  of 
the  assertion,  and  the  target  variable  appears  on  the  successor  list. 
A  hierarchical  relationship  exists  'between  a  higher  level  data 
structure  and  its  components.  For  example,  there  is  a  hierarchical 
relationship  between  a  file  and  the  records  it  contains,  or  between  a 
group  and  its  component  fields.  Therefore,  the  records  of  a  file 
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appear  on  the  successor  list  of  the  file's  dictionary  entry,  and  the 
file  appears  of  the  predecessor  lists  of  each  of  the  records' 
dictionary  entry.  .  A  data  parameter  relationship  exists  between  a 
range  array  and  the  nodes  whose  range  it  defines.  A  data  parameter 
edge  is  drawn  from  a  range  array  to  the  nodes  whose  ranges  it  defines. 

IV. 4.1  The  edge  Data  structure 

Each  element  in  a  node  predecessor  or  successor  list  points  to  an 
EDGE  data  structure  (Figure  4.3).  The  DIMDIP  field  in  this  structure 
is  the  dimensionality  of  the  target  node  minus  the  dimensionality  of 
the  source  node.  The  EDGE-TYPE  indicates  the  type  of  precedence 
relationship  between  source  and  target.  There  are  several  edge  types 
with  which  we  will  be  concerned.  A  Type  3  edge  is  drawn  from  an 
independent  variable  of  an  assertion  (on  the  right  hand  side)  to  the 
assertion.  A  Type  7  edge  is  drawn  from  an  assertion  to  the  variable 
it  defines  (the  variable  on  the  left  hand  side).  Edge  Types  1  and  2 
are  hierarchical  edges  drawn  from  a  node  in  an  input  file  to  its 
imsediate  descendants  for  the  Type  1  and  from  a  node  in  an  output  file 
to  its  immediate  ancestor  for  the  Type  2  edge.  Edge  Types  13  and  14 
are  used  for  the  range  arrays  SIZE  and  END  respectively.  They  are 
drawn  from  the  range  array  node  to  the  node  whose  range  is  being 
defined.  The  Type  8  edge  is  drawn  between  siblings  in  an  input  or 
output  file  if  the  nodes  concerned  are  below  the  record  level  or  if 


they  belong  to  sequential  files.  A  Type  21  edge  is  drawn  from  the 


module  to  each  file.  It  indicates  the  precedence  of  the  module  over 
the  file  nodes. 

In  the  sequential  version  of  the  Model  Processor,  Type  8  edges 
are  also  drawn  from  the  last  field  of  a  repeating  structure  of  an 
input  sequential  file  back  to  the  node  representing  the  structure,  and 
from  a  repeating  structure  in  an  output  sequential  file  to  the  first 
field  in  the  structure.  Por  an  input  structure,  this  edge  is  inserted 
because  the  last  field  of  the  previous  instance  of  the  structure  must 
be  processed  before  the  next  instance  of  the  structure  may  be 
accessed.  Por  the  output  structure,  the  edge  means  that  an  instance 
of  a  repeating  structure  may  not  be  written  until  the  previous 
instance  has  been  completely  written. 

However,  as  we  have  seen  in  Chapter  2,  data  structures  are  not 
accessed  and  stored  on  data  flaw  machines  in  the  same  way  that 
structures  in  sequential  files  are  accessed  and  stored  in  conventional 
machines.  Several  instances  of  a  repeating  structure  may  be  accessed 
concurrently  on  a  data  flow  machine.  Therefore,  these  two  cases  of 
Type  8  edges  are  not  used  in  the  Model  Data  Plow  Processor. 

IV. 4. 2  The  EDGE  Subscript  Expression  List 

The  final  field  of  the  edge  data  structure,  SUBX,  is  the  list  of 
subscript  expressions  used  in  the  source.  If  the  subscript  expression 
is  of  the  form  Uq  or  Uq-c  for  some  constant  c,  then  LOCAL_SUB#  is  q, 
that  is,  the  ordinal  number  of  the  subscript  in  the  target.  APR_MODE 


is  the  subscript  expression  type.  The  system  distinguishes  seven 
types  of  subscript  expressions.  A  Type  1  subscript  expression  is  a 
simple  subscript  reference,  as  in  A(I).  A  Type  2  subscript  expression 
is  of  the  form  1-1,  as  in  A(I-1).  A  Type  3  subscript  expression  is  of 
the  form  I-k  for  some  positive  integer  k  greater  than  1,  as  in  A(I-5). 
A  Type  4  subscript  expression,  or  generalized  subscript  expression,  is 
anything  other  than  the  first  three,  for  example,  A(N*(R+T)).  in  this 
example,  N,  R,  and  T  are  assumed  to  be  variables  declared  elsewhere  in 
the  specification.  The  expression  "N*(  R+T)'*  is  the  Type  4  subscript 
expression.  Types  5-7  are  used  for  indirect  indexing  vectors.  An 
indirect  indexing  vector  can  be  used  to  hold  index  values  for  another 
array.  The  Model  Processor  detects  the  use  of  indirect  indexing.  The 
sequential  version  of  the  Scheduler  optimizes  the  generated  program 
when  this  feature  is  used  (Luka82].  If  the  subscript  expression  is 
Type  3,  the  I_K  field  holds  the  constant  offset.  Por  example,  for  the 
reference  A(l-5),  the  value  of  I_K  is  5. 

IV. 5  RANGE  SETS 

The  size  of  an  array  dimension  is  called  the  range  of  the 
dimension.  The  system  must  determine  the  range  of  each  node 
dimension,  either  as  a  constant,  or  in  terms  of  the  range  arrays 
described  earlier.  If  the  node  is  placed  by  the  scheduler  into  an 
iterative  block,  then  the  range  of  a  dimension  determines  the  number 
of  iterations  of  the  node  for  that  dimension.  If  the  node  is  placed 
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in  a  parallel  block,  the  range  determines  the  number  of  incarnations 
of  the  node  Which  may  be  active  concurrently. 

Prior  to  scheduling,  the  system  determines  the  range  for  each 
node  subscript.  It  places  each  node  subscript  into  a  range  set .  All 
the  node  subscripts  in  a  range  set  have  the  same  range.  However,  no 


two  dimensions  of 

a 

node 

can 

be  in  the 

same  range 

set .  This 

is 

because  the  range 

set 

is 

the 

basis  for 

block  of 

iteration 

or 

duplication,  and  no  two  dimensions  of  the  same  node  can  be  at  the  same 
block  level.  The  range  set  number  is  stored  in  the  RANGE  field  of  the 
node  subscript. 

The  range  of  a  node  dimension  can  be  established  in  one  of 
several  ways.  A  constant  bound  may  be  used  in  declaring  the  array. 
If  the  array  is  part  of  an  input  file,  the  end-of-file  condition 
defines  the  range.  If  the  array  is  referenced  with  a  subscript,  the 
range  of  the  subscript  is  used  to  define  the  array  dimension  range . 
The  SIZE  or  END  qualifiers  may  be  used  to  define  the  range. 

If  a  range  array  is  used  to  define  a  node  dimension,  the  system 
stores  this  information  with  the  dimension.  In  particular,  it  records 
whether  any  more  significant  (to  the  left)  dimensions  are  used  in 
computing  the  range  array.  These  are  called  the  real  arguments  to  the 
dimension.  When  real  arguments  exist  for  the  node  dimension,  this 
indicates  that  the  ranges  corresponding  to  the  real  arguments  precede 
the  range  of  the  dimension.  Existence  of  real  arguments  to  a  range 
array  imposes  a  partial  order  on  the  range  sets.  In  generating  nested 
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blocks,  the  scheduler  wist  observe  the  partial  order.  Ms  requires 
in  the  synthesized  program  that  the  blocks  corresponding  to  the  real 
arguments  enclose  the  block  corresponding  to  the  range  in  question. 
The  real  argument  list  is  stored  in  the  RALP  field  of  the  node 
subscript. 

TV. 6  PHYSICAL  AMD  VIRTUAL  DIMENSIONS 

A  node  in  the  array  graph  is  an  aggregate  composed  of  zero  or 
more  dimensions.  At  each  node  dimension,  the  number  of  instances  of 
that  dismnsion  is  determined  by  the  range  of  the  dimension.  Por 
example,  let  A  be  a  one— dimensional  array.  Let  the  range  of  that 
dismnsion  be  5: 

A  IS  FIELD  (5)  FIXED  BINARY; 

The  aggregate  Character  of  the  data  node  A  is  interpreted  in  the 
generated  program  as  representing  a  variable  whose  data  type  is 
stream.  A  two-dimensional  data  node  corresponds  in  the  generated 
program  to  a  stream  of  a  stream,  that  is,  a  two  level  stream.  The 
scheduler  determines  the  number  of  levels  of  stream  required  for  the 
variable  in  the  generated  program  from  1)  the  data  node  dimensionality 
and  2)  the  ways  in  which  each  dimension  of  the  data  node  is 
referenced.  The  scheduler  establishes  a  mapping  from  each  data  node 
dismnsion  in  the  array  graph  to  a  level  of  stream  for  the  variable  in 


the  data  flow  template.  The  mapping  differentiates  two  possible 
cases i 

1.  A  data  node  dimension  is  mapped  to  a  stream,  and  the  number 
of  stream  elomemts  is  equal  to  the  range  of  the  dimension,  In 
this  case,  the  node  dimension  is  called  physical .  The  field 
STOTYP  in  the  node  subscript  description  is  set  by  the  scheduler 
to  -(range  of  the  dimension).  Por  the  above  example,  the  STOTYP 
for  dimension  1  of  A,  where  A  is  physical,  is  set  to  -5. 

2.  A  data  node  dimension  is  mapped  to  one  or  more  representative 
elements.  The  number  of  elements  required  in  the  generated 
program  is  fewer  than  the  range  of  the  dimension.  The  dimension 
is  called  virtual.  The  number  of  elements  representing  the 
dimension  is  called  the  window  into  the  dimension.  In  this  case, 
the  STOTYP  for  the  dimension  is  set  to  the  width  of  the  window. 
If  dimension  1  of  A  is  determined  by  the  scheduler  to  be  virtual 
with  a  window  width  of  1  element,  then  the  STOTYP  for  that 
dimension  is  set  to  1. 

If  a  variable  in  the  generated  program  can  be  represented  by 
fewer  representative  elements  than  dictated  by  the  data  node 
dimensionality,  the  generated  program  is  more  efficient.  Chapter  6 
discusses  efficiency  gained  through  detection  of  virtual  dimensions. 


IV.  7  CONCLUSION 


this  completes  the  introduction  to  the  Model  System.  We  have 
introduced  the  array  graph,  a  compact  representation  of  the  underlying 
graph  of  a  specification.  We  have  identified  the  nodes  of  the  array 
graph,  and  have  shown  how  edges  ( indicating  various  types  of 
precedence  relationships)  are  inserted  into  the  graph.  It  addition, 
the  concepts  of  node  dimensionality,  range,  and  physical  versus 
virtual  storage  allocation  have  been  defined. 

The  next  two  Chapters  describe  the  process  of  scheduling: 
translating  the  array  graph  to  a  data  flew  template.  Chapter  5 
introduces  the  process  of  scheduling  and  presents  a  simple  algorithm 
to  generate  a  data  flaw  template.  Chapter  6  discusses  methods  to 
increase  efficiency  in  the  generated  program. 


CHAPTER  V 


SCHEDULING  THE  ARRAY  GRAPH  FOR  A  DATA  PLOW  MACHINE 


V.  I  INTRODUCTION 

Translation  from  the  array  graph  to  a  data  flow  program  is 
performed  in  two  steps.  The  translation  occurs  in  two  phases  to 
separate  the  problem  of  determining  the  structure  of  the  data  flew 
program  from  the  problem  of  implementing  that  structure  on  a  specific 
machine.  The  array  graph  is  translated  first  to  a  data  flow  program 
template.  The  £flnlatfi  is  an  intermediate  form  of  the  data  flow 
program,  it  is  a  machine-  and  language- independent  representation  for 
a  data  flow  program.  The  template  serves  as  input  for  the  second  step 
of  translation:  to  a  program  for  a  specific  data  flow  language  and 
machine  on  a  chosen  level  of  a  high  level  language,  assembly  language, 
or  a  machine  language.  Since  the  format  of  the  data  flow  template  is 
independent  of  the  object  language  and  machine,  the  same  template  can 
be  translated  to  prograsm  for  different  data  flow  languages  and 


Machines.  In  our  case,  the  program  template  is  translated  to  the  MaD 
Programming  Language  for  the  Manchester  University  data  flow  machine. 
The  first  step  of  the  translation,  from  an  array  graph  to  a  program 
template,  is  called  scheduling. 

During  scheduling,  nodes  of  the  array  graph  are  merged  into 
components.  A  data  flow  computation  block  is  generated  from  each 
component.  In  the  following  section,  the  source  and  target 
representations  of  the  scheduling  process  are  described,  then,  a 
simple  algorithm  is  defined  for  translating  the  array  graph  to  a  data 
flow  program  template.  With  this  algorithm,  the  components  from  which 
blocks  are  generated  are  the  smallest  consistent  with  the  precedence 
relationships  defined  by  the  array  graph. 

V.2  DATA  STRUCTURES  USED  BY  THE  SCHEDULER 

V.2.1  The  Component  Graph 

The  input  to  the  data  flow  scheduler  is  the  array  graph.  The 
data  structure  representation  of  the  array  graph  is  described  in 
Section  IV.  1.  The  scheduler  first  builds  from  the  array  graph  a 
component  graph.  The  component  graph  is  a  more  compact  representation 
of  the  array  graph .  Each  component  corresponds  to  a  Maximally 
Strongly  Connected  Component  (MSCC)  of  the  array  graph .  (An  MSCC  is  a 
subgraph  of  the  array  graph  in  which  there  is  a  path  from  any  node  to 


every  other  node . )  The  component  graph  is  represented  as  a  vector 


NOOELST(  I)  points  to  the  list  of  notes  contained  in  the  I’th 
component .  Each  note  Is  described  by  a  GHOOE  entry.  A  GMODE  has  the 
following  fields t 

-  WOCL.ID.  The  WOOe_ID  is  the  Index  of  the  note  In  the  array 
graph. 

-  SUXL.  the  SUXL  is  the  intra-coeponent  list  of  successors  to 
the  note.  Each  entry  in  the  successor  list  of  a  node  describes 


-  93  - 

an  edge  in  the  array  graph  fro*  the  node  to  another  node  in  the 
sane  component. 

-  tOCr.GMOOE.  This  field  is  used  to  link  together  all  the  nodes 
in  a  component .  a  NXT.GNODE  entry  points  to  the  the  next  node  in 
the  component . 

V.2.2  The  Data  Plow  Template 

The  template  consists  of  two  parts,  data  description  and  block 
description.  The  data  description  entries  define  the  characteristics 
of  data  to  be  used  in  the  final  prog  ran.  There  are  three  types  of 
data:  input,  output,  and  inter in.  The  input  and  output  data 
description  entries  are  formed  from  the  source  and  target  file 
descriptions  of  the  specification.  Any  data  not  declared  as  part  of  a 
source  or  target  file  is  part  of  the  interim  data  entry.  The  data 
description  entries  contain  the  file  nodes  of  the  specification,  and  a 
node  for  the  interim  "file."  These  nodes  are  represented  by  their 
array  graph  node  numbers.  Each  of  these  file  nodes  is  the  root  of  a 
generalised  tree.  Intermediate  nodes  of  the  tree  represent  CROUP  or 
RECORD  nodes.  Leaf  nodes  of  the  tree  correspond  to  data  fields  of  the 
specification.  Thus  all  the  descendants  of  the  file  cam  be  accessed 
from  the  description  of  the  file  node.  In  the  second  step  of 
translation,  code  generation  to  a  lower  level  data  flew  language,  data 
declarations  appropriate  to  the  target  language  are  generated  from  the 


data  description  entries  in  the  template. 


The  second  part  of  the  data  flow  template  is  the  block 
description  section.  A  block  description  entry  contains  information 
needed  to  construct  the  computation  blocks  of  the  final  program.  A 
program  element  indicating  repetition  or  duplication  is  generated  from 
each  block  description  entry.  the  block  description  entry  also 
specifies  the  dimension  and  node  number  of  each  data  node  produced  as 
a  result  of  block  evaluation.  the  final  portion  of  the  block 
description  entry  is  a  list  of  block  members,  the  members  may  be 
assertions  or  other  blocks  nested  within  the  given  block.  A  block 
contains  the  following  fields t 

-  (timber.  An  integer  index  assigned  to  the  block. 

-  Type.  A  block  may  be  of  type  simple,  iterative,  or  parallel. 
A  fiTTl*  block  is  not  to  be  duplicated  or  expanded  at  run-time. 
It  consists  of  exactly  one  incarnation.  An  iterative  block  is 
expanded  sequentially  at  runtime.  A  parallel  block  is  expanded 
in  parallel  at  runtime.  There  may  be  many  incarnations  of  a 
parallel  block  active  at  the  same  time  during  execution  of  the 
data  flow  program. 

-  Level.  The  nesting  level  of  the  block.  The  outermost  block  is 
at  nesting  level  0. 

-  Range.  This  field  defines  the  number  of  repetitions  or 
incarnations  of  a  block  Which  will  be  active  as  the  block  is 


executed  on  a  data  flow  machine. 


For  iterative  or  parallel 


blocks,  the  range  field  bolds  the  range  set  number  associated 
with  the  block,  for  simple  blocks,  the  field  is  o. 

-  Data  nodes.  There  is  one  entry  for  each  data  node  of  type 
field  which  is  defined  as  a  result  of  block  evaluation.  The 
entry  consists  of  three  parts.  The  first  part  is  the  node  number 
of  the  data  node  being  defined  in  the  block.  The  second  part  is 
the  ordinal  position  of  the  distension  of  the  data  node  which  is 
defined.  For  a  scalar  data  node  this  field  is  zero,  since  a 
scalar  has  zero  dimension.  The  dimensions  are  numbered  (right  to 
left)  from  least  significant  to  most  significant.  The  notation 
(D,i)  is  used  to  refer  to  a  data  node.  The  D  is  the  identifier 
associated  with  the  data  node.  The  i  is  the  dimension  being 
defined.  The  next  section  describes  how  it  is  determined  that  a 
data  node  is  defined  in  a  block. 

-  Block  members .  The  member  entry  consists  of  two  parts.  The 
first  part  is  the  member  type.  A  block  member  may  be  an 
assertion  or  another  block.  The  second  part  is  the  member 
number.  if  the  member  is  an  assertion,  the  number  is  the  array 
graph  node  number  of  the  assertion.  If  the  member  is  a  block, 
the  number  is  the  block  number. 

The  data  flow  template  is  illustrated  in  Figure  5.2. 


V.3  A  SIMPLE  SCHEDULING  ALGORITHM 


Because  Model  follows  the  data-driven  semantics  of  the  data  flow 
computation  model  (see  Section  III. 4),  a  program  template  can  be 
synthesized  easily  from  the  component  graph.  The  data  description 
section  is  created  by  placing  each  node  of  a  component  which  is  of 
type  file  into  this  section  of  the  template.  The  block  description 
section  of  the  template  is  produced  by  the  procedure  schedule. 
SCHEDULE  calls  on  a  mutually  recursive  pair  of  procedures  to  perform 
the  actual  scheduling.  The  first  procedure,  SCHEDULE_GRAPH,  is  given 
a  component  graph  as  input  and  produces  the  block  description  entries 
for  the  all  components.  SCHEDULE_GRAPH  calls  on  SCHEDULE_CQMPONENT  to 
generate  a  block  description  entry  for  each  component  of  the  component 
graph.  If  the  component  C  contains  more  than  one  array  graph  node, 
SCHEDUI^jGOMPOMEOT  may  delete  certain  intra-component  edges,  and 
recursively  call  SCHEDULE_GRAPH  with  the  component  C  as  parameter .  If 
intra-component  edges  were  deleted,  c  may  no  longer  be  an  MSCC. 
Therefore,  SCHEDULE-GRAPH  can  perform  the  same  actions  on  this 
subgraph  as  it  did  on  the  original  graph.  The  simple  scheduling 
algorithm  is  described  in  greater  detail  below.  The  process  is 
illustrated  with  the  EXAMPLE  specification  (Figure  3.3)  and  array 
graph  (Figure  4.5). 

The  procedure  SCHEDULE  first  performs  initialization  of  data 
structures  needed  by  SCHEDULE-GRAPH  and  SCHEDULE-COMPONENT.  SCHEDULE 
then  calls  SCHEDULE-GRAPH  to  synthesize  a  block  description  entry  for 


the  entire  specification.  this  entry  corresponds  to  the  outermost 
block  of  the  generated  MaD  program. 

V.  3 . l  initialization 

A  component  graph  is  constructed  from  the  array  graph.  The 
component  graph  is  viewed  initially  as  consisting  of  only  one 
component.  Each  node  of  the  array  graph  is  a  member  of  the  single 

component.  The  SUXL  field  of  a  node  in  the  component  is  built  from 

the  successor  list  of  the  corresponding  node  in  the  array  graph. 
SCHEDULE  allocates  space  for  a  block  description  entry  for  the  initial 
component.  During  initialization,  some  of  the  fields  of  this  entry 

are  defined.  The  block  is  of  type  simple  since  there  is  only  one 

incarnation  of  the  entire  program.  The  nesting  level  is  o.  The  data 
node  portion  of  the  block  description  entry  is  constructed.  A  data 
node  in  the  component  is  added  to  the  data  node  section  of  the  block 
description  entry  for  the  initial  component  if  the  data  node  is  a 
field,  has  zero  dimension  (a  scalar),  and  is  the  target  of  some 
assertion.  The  block  description  entry  for  EXAMPLE  at  this  point  is 
as  follows: 

-  Type  -  Simple 

-  Level  -  0 

-  Range  *  0 

-  Data  nodes  “  null  (there  axe  no  scalar  data  nodes) 

SCHEDULE  then  calls  SCHEDULE_GRAPH .  Call  parameters  to  SCHEDULE_GRAPH 


are  the  initial  component  graph,  the  block  description  entry,  and  the 
nesting  level  (1+  the  current  nesting  level  -  l). 

V.3.2  Procedure  SCHEDULER  GRAPH 

SCHEtXJL£_GRAPH  receives  as  input  a  component  graph  and  a  block 
description  entry.  It  produces  as  output  block  members  for  the  block 
description  entry.  SCHEDULEL.GRAPH  performs  the  following  actions: 

1.  Find  the  MSCC's  of  the  component  graph.  An  Msec  of  the 
component  graph  is  a  set  of  array  graph  nodes  which  are  maximally 
strongly  connected.  The  component  graph  is  modified  to  reflect 
the  MSCC's.  Figure  S.3a  illustrates  the  initial  cosf»nent  graph 
for  module  EXAMPLE.  After  the  MSCC's  of  this  component  graph 
have  been  found,  the  new  component  graph  contains  two  components. 
The  new  graph  is  illustrated  in  Figure  5.3b. 
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2.  Call  the  procedure  SCHEDULE_OOMPONENT  once  for  each  component 
of  the  component  graph.  SCHEDULE_COMPONENT  adds  one  block  member 
and  one  or  more  data  nodes  to  the  block  description  entry  for  the 
component. 

3.  Return  a  completed  block  description  entry. 


V.3.3  Procedure  SCHEDULE_COMPONENT 

Input  to  SCHEDULE_COMPONEtrr  are  a  single  component,  Ci,  from  the 
component  graph;  a  block  description  entry,  BD;  and  the  nesting 
level,  ML.  The  output  is  a  modified  block  description  entry: 
information  about  a  new  block  member  is  inserted  into  BO.  This 


process  is  discussed  below. 

SCHEDCJLE_COHPONENT  searches  the  nodes  of  Ci  for  node  dimensions 
which  have  not  yet  been  processed.  A  dimension  of  a  node  which  has 
not  been  processed  is  called  an  unscheduled  dimension.  The  procedure 
first  finds  the  the  minimum  number  of  unscheduled  dimensions  of  any 
node  in  the  component.  It  then  performs  the  following  analysis: 

bet  Ci  be  the  component  being  analyzed,  and  jci j  denote  the 
number  of  nodes  in  Ci.  bet  HINFREE  be  the  minimum  number  of 
unscheduled  dimensions, 
case  1.  If  |Ci|  -  1  and  HINFREE  -  0: 

If  the  node  type  of  the  single  node  in  the  component  is 
"assertion" ,  then  return  the  node  as  the  block  member  of  the 
block  description  entry.  The  member  type  is  "assertion."  The 
member  number  is  the  array  graph  node  number  of  the  single  node 
in  the  component.  If  the  node  is  not  an  assertion  node,  then  a 
null  entry  is  returned. 

Case  2.  If  HINFREE  >  0:  This  indicates  that  there  is  at  least 
one  unscheduled  dimension  in  each  node  in  the  component. 
SCKEDUULCOMPONEtrr  attempts  to  find  a  range  common  to  an 
unscheduled  dimension  of  each  node .  It  verifies  that  the 
dimension  corresponding  to  the  range  Chosen,  the  distinguished 
dimension,  is  in  a  consistent  position. 


Assertions  A<  I,  J)  -  A<  I,  J-l)  +  A<  J.  I ) 


s 

Ci  contains  two  nodes,  A  and  Assertion.  Each  node  has  two 
distension.  Assume  that  SCHEDULE_COMPONENT  chooses  to  schedule 

the  least  significant  dissension  (J).  However,  there  is  a  ® 

conflict  in  the  choice  of  dimension  for  A.  The  least  significant 
dimension  of  A  in  one  case  corresponds  to  the  range  set  of  £  and 

in  the  other  case  corresponds  to  the  range  set  of  J.  The  same  V 

problem  occurs  if  SCHEDULE_COMPOMENT  chooses  to  schedule  the  most 
significant  dimension  (£).  Zn  either  case,  there  is 

inconsistency  in  the  position  of  the  chosen  dimension.  This  I 

anomaly  is  discussed  below.  For  now,  we  assume  that  a  common 
range  is  located,  and  that  the  distinguished  dimension  is  in  a 

consistent  position  in  each  node.  If  a  common  range  can  be  found  I 

and  the  component  contains  at  least  one  assertion, 

SCHEDCJLE_COHPONENT  cam  create  a  new  block  from  the  component. 

After  the  block  is  constructed,  the  block  description  entry  for  I 

this  new  block,  BD1,  is  returned  to  SCHEDU££_GFAPH  as  a  member  of 
BO.  The  member  type  is  "block,"  and  the  member  number  is  the 

index  assigned  to  BD1  in  the  data  flow  template .  This  new  block  | 

is  nested  in  the  block  which  SCHE!XJI£_GRAPH  is  building.  The 
data  structure  of  a  block  is  described  above.  The  values  of  the 

fields  in  a  block  are  defined  as  follows:  | 

a )  Block  number .  The  block  number  is  the  index  of  the  next 


available  template  entry 


b)  Block  type.  If  there  axe  arty  edges  with  subscript 
expressions  of  Types  2  or  3  in  the  position  of  the 
distinguished  dimension,  the  block  is  of  type  iterative. 
Otherwise  the  block  is  of  type  parallel. 

c)  Block  Level.  The  block  level  is  set  to  the  current 
nesting  level,  a  call  parameter. 

d)  Block  Range.  This  the  range  set  number  of  the  range 
found  common  to  am  unscheduled  dimension  of  each  node. 

e)  Data  nodes  defined  in  the  block.  Por  each  assertion  in 
the  component,  the  array  graph  node  number  of  the  target  of 
the  assertion  is  added  to  the  data  node  portion  of  the  block 
description  entry.  The  ordinal  position  of  the 
distinguished  dimension  is  added  as  the  dimension  of  the 
data  node  being  defined. 

f )  Block  members.  SCHEDULE_COMPONENT  deletes  all  edges  with 
subscript  expression  of  Types  2  or  3  in  the  distinguished 
dimension.  It  then  calls  SCHEDULE_GRAPH  recursively  with 
the  resultant  subgraph,  BD1,  and  ( 1  +  the  current  nesting 
level)  as  parameters .  SCHED(JLE_GRAPH  returns  all  the 
members  of  the  block.  Each  member  may  be  an  assertion  or 
may  itself  be  a  block. 

Cases  i  and  2  cam  be  illustrated  with  the  component  graph  of  the 
module  EXAMPLE.  Let  SCHEDULE_COMPONENT  be  called  with  component 
Ci  -  Assertion  1.  Ci  has  one  unscheduled  dimension,  the 


dimension  corresponding  to  the  range  of  I .  this  is  the 


distinguished  dimension  for  Ci.  SCHEDULE_COMPONENT  creates  a  new 
block  description  entry  BD1  for  Ci.  Since  there  are  no  Type  2  or 

3  edges,  the  block  type  is  parallel.  The  block  level  is  the  I 

nesting  level,  NL.  The  range  is  the  range  set  number  of  the 
range  set  containing  I.  There  is  one  assertion  ci.  The  target 

of  the  assertion  is  X.  Therefore,  (X, 1)  is  the  data  node  defined  I 

in  this  new  block.  The  '1*  refers  to  the  ordinal  position  of  the 
dimension  of  X  being  defined,  in  this  case,  the  first.  This 

distension  of  X  is  marked  as  processed,  and  SCHEDULE_GRAPH  is  I 

called  with  parameters  Ci,  BD1,  and  NL-2.  Since  Ci  is  an  Msec, 

SCHEDULE_GBAPH  calls  SCHEDULE-COMPONENT  with  parameters  Ci,  BD1, 

and  NL-2 .  Now  Case  1  applies  since  |Ci|  -  1  and  there  are  no  | 

unscheduled  dimensions.  The  node  in  Ci  is  of  type  "assertion, " 
so  the  node  is  inserted  into  BD1  as  a  member  of  BD1  of  type 

"assertion."  I 

SCHEDULE-GRAPH  also  calls  SCHEDULE-COMPONENT  later  on  with 
Ci^Assert ion  2.  The  sequence  outlined  above  is  repeated. 

Another  member  BD2  is  added  to  the  block  description  entry  BO. 

BD2  is  defined  similarly  to  BD1.  The  block  type  is  parallel,  the 
level  is  1,  and  the  range  is  the  range  of  I.  The  data  node  (c,l) 
is  defined  in  BD2.  Figure  5.4  shows  the  completed  data  flow 


template  for  example. 


pile  Description:  (INPILE1,  INFILE2,  OUTPILE,  INTERIM) 

BlocX  Description: 

Block  1:  Type-Simple  Level-O  Range-0 
Data  Nodes:  None 

Block  Masters:  ( ( m_namo -Block  2  m_type-block ) 

( m_name— Block  3  m_type-block  ) ) 

Block  2:  Type-Parallel  Level-l  Range-1 
Data  Nodes:  ((d_name-X  d_dim-l)} 

Block  Masters:  {(^_name-Assertion  1  m_type-assertion ) ) 

Block  3:  Type-Parallel  Level-1  Range-1 
Data  Nodes:  ((d_naae<<:  d_dim-l)) 

Block  Masters:  ( ( m_namo -Assert ion  2  m_type— assertion)} 

Figure  5.4  The  Tesy  late  for  EXAMPLE 

Case  3.  If  |C|  >  1  and  MINPREE  -  0:  This  case  represents  a 

cycle  in  the  array  graph  which  the  scheduler  is  unable  to 
eliminate .  The  cycle  may  indicate  a  true  cyclic  dependency  Which 
would  cause  the  data  flow  program  to  hang  in  execution.  An 
example  of  such  a  dependency  is  the  following  pair  of  assertions. 

ASS  1:  A(  I )  -  B(I)  +  C{  I); 

ASS  2:  B<  I)  -  A(  I )  -  C(E); 

In  this  example,  the  first  assertion  can  only  be  evaluated  When 
the  value  of  array  B  has  been  defined.  However,  the  value  of  the 
B  depends  on  A.  Figure  5.5  shows  the  graph  for  this  example. 
Since  the  value  of  B  depends  on  the  value  of  A  and  the  value  of  A 
depends  on  the  value  of  B,  neither  assertion  can  be  evaluated. 


>8  - 


However,  in  other  cases,  a  cycle  in  the  graph  does  not 
necessarily  indicate  that  the  corresponding  data  flow  program 
would  hang.  Por  example,  the  following  assertions  can  be 
evaluated : 

A(  I )  -  IP  I-D  THEN  B(  I )  +  C(  I )  ELSE  C(  I  ); 

B<  I)  =  IP  I  --  E  THEN  A<  I )  -  C(  I )  ELSE  C(  I ); 

Here,  the  only  element  of  A  which  depends  on  B  is  A(D).  The 
value  of  D  is  available,  however,  only  at  run  time.  Let  the 
runtime  value  of  D-5,  and  the  run  time  value  of  E  be  some  index 
other  than  5.  Then,  in  this  case,  B  does  not  depend  on  A(D). 
The  data  flow  program  executes  as  follows:  Each  element  of  A 

except  A(5)  cam  be  evaluated  as  C  becomes  available.  Once  these 
values  of  A  are  defined,  B  cam  be  evaluated.  B(5)  receives  a 
value  as  soon  as  C(5)  is  available.  After  B(5)  is  defined,  A(5) 
cam  be  evaluated.  Therefore,  each  element  of  A  and  each  element 
of  B  cam  be  evaluated.  The  program  does  not  hang.  The  cycle  in 
the  graph  cannot  be  eliminated  because  it  is  not  always 
determinable  at  compile  time  that  individual  array  elements  will 
be  defined  before  they  are  used. 

The  cases  described  above  cause  SCHE0CJLE_C0MP0NENT  to  detect 
that  }C|  >1  and  HINFFEE  -  o.  The  sequence  of  events  which  cause 
this  are  as  follows: 


a)  SCHEDULE_GRAPH  calls  SCHEDULE_COMPONENT  with  the  graph  of 


Figure  5.S. 


b)  SCHEDULE_COMPONENT  finds  an  unscheduled  dimension  common 
to  all  the  nodes  in  the  component ,  the  dimension 
corresponding  to  the  range  of  I.  In  this  case,  |C|  >  1  and 
MINFREE  >  0.  Therefore,  SCHEDULE-COMPONENT  marks  the 
distinguished  dimension  of  each  node  in  the  component  as 
scheduled,  and  attempts  to  delete  arty  edges  with  Type  2  or 
Type  3  subscript  expressions.  In  this  case,  there  are  none. 
It  then  calls  SCHEDULE-GRAPH  with  this  component  as  a  new 
subgraph.  SCHEDULE-GRAPH  attempts  to  find  the  Msec's  of 
this  subgraph.  Since  no  edges  were  deleted,  there  is  still 
only  one  MSCC.  SCHEDULE-GRAPH  calls  SCHEDULE-COMPONENT  with 
this  component  as  parameter.  Now  SCHEDULE-COMPONENT 
discovers  that  |C|  >1  and,  since  the  one  dimension  of  each 
node  is  already  being  scheduled,  MINFREE  -  0. 

Since  the  cycle  is  potentially  resolvable  when  the  program  is  run 
on  a  data  flew  machine,  SCHEDULE— COMPONENT  continues  with 
scheduling.  It  reports  a  warning  to  the  user  of  a  possible  cycle 
in  the  graph.  It  then  deletes  an  arbitrary  edge  from  the 
component,  and  calls  SCHEDULE-GRAPH  with  the  component .  Deletion 
of  the  edge  may  result  in  an  acyclic  subgraph  which 
SCHEDULE-GRAPH  can  then  handle  in  the  normal  manner.  For 
example,  in  Figure  5.5,  if  the  edge  from  the  second  assertion  to 
B  is  deleted,  the  graph  is  acyclic.  If  deletion  of  the  first 
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edge  chosen  does  not  produce  am  acyclic  graph,  the  process  will 
he  repeated  recursively  until  an  acyclic  graph  is  produced. 

Case  4.  MINFREE  >  0  but  no  common  range  can  be  found:  This  case 
was  introduced  earlier.  There  is  no  range  for  which  a  consistent 
dimension  cam  be  located  in  each  node.  This  case  is  similar  to 
Case  3.  However,  with  the  present  implementation  of  structured 
data  on  the  Manchester  machine,  the  program  generated  from  this 
graph  cannot  run  successfully.  K  structure  must  be  completely 
defined  in  one  block  before  it  is  available  to  be  accessed  by 
instructions  in  other  blocks.  The  graph,  however,  indicates 
cyclic  dependency.  Generating  an  element  in  one  array  depends  on 
an  arbitrary  value  in  another  array,  and  vice  versa  for  the  other 
array.  Therefore,  an  element  from  each  array  is  being  selected 
before  the  entire  array  is  defined.  In  addition,  there  is  no 
consistent  range,  so  the  assertions  defining  the  two  arrays 
cannot  be  included  in  the  same  block. 

To  warn  the  user  of  this  problem,  SCHEDUI^_COMPONENT  issues 
an  error  message  reporting  a  cycle  in  the  graph,  and  makes  no 
change  to  the  block  description  entry.  The  graph  cannot  be 


scheduled 


V.4  CONCLUSION 


nils  concludes  the  discussion  of  the  simple  scheduling  algorithm. 
The  next  chapter  describes  modifications  to  the  algorithm  to  increase 
efficiency  in  execution  time  and  in  storage  requirements  on  the  data 


flow  machine. 


CHAPTER  VI 


SCHEDULING  II:  EFFICIENCY  CONSIDERATIONS 

VI. 1  INTRODUCTION 

Ttte  algorithm  described  in  the  previous  chapter  produces  a  data 
flow  template  from  an  array  graph.  However,  it  is  possible  to  further 
analyze  the  array  graph  and  to  produce  a  template  from  which  a  more 
efficient  program  can  be  generated.  The  algorithm  described  below 
merges  components  so  that  the  generated  block  contains  more  elements 
than  in  the  simple  algorithm. 

The  dimensions  along  Which  to  measure  efficiency  of  a  data  flow 
program  are  still  being  formulated.  Most  designs  of  data  flow 
computers  are  on  paper  or  in  very  early  prototype  stage.  One  study  of 
the  performance  of  a  proposed  machine  has  been  done  using  simulation 
(Gost80].  prototypes  Which  have  been  built  have  not  been  studied 
extensively  in  terms  of  efficiency  in  programing  them. 


Me  can  demonstrate  in  a  data  flow  scheduler,  therefore,  some 


optimizations  Which  seem  appropriate  1)  from  results  obtained  from 
running  programs  on  a  specific  machine  and  2)  from  a  comparative  study 
of  several  different  machines  (see  Chapter  2).  Two  general  areas  of 
optimization  are  pursued  here.  These  areas  are  block  enlargement  and 
aata  structure  simplification. 

VI. 2  BLOCK  ENLARGEMENT 

The  first  area  of  optimization  is  to  enlarge  the  size  of  a  block 
generated  by  the  scheduler.  The  size  of  a  block  is  the  number  of 
block  members.  Enlarging  the  size  of  a  block  generates  a  more 
efficient  data  flow  program  in  two  ways.  The  first  reason  that  having 
one  large  block  may  be  more  efficient  than  having  several  small  blocks 
has  to  do  with  reducing  the  number  of  allocations  of  program  units  to 
processors  in  the  data  flow  machine. 

A  data  flaw  machine  is  composed  of  one  or  more  processors 
communicating  through  an  interconnect.  An  individual  processor  has 
only  local  storage  for  programs  and  data  tokens.  There  is  no  memory 
shared  by  all  the  processors  in  the  machine.  If  data  produced  in  one 
processor  is  needed  by  an  instruction  in  another  processor,  that  data 
must  be  transmitted  along  the  communication  path  to  the  other 
processor.  The  transmission  may  require  routing  through  one  or  more 
intermediate  link.  To  minimize  the  cost  of  data  transmission  between 
processors  in  the  machine,  the  scheduler  follows  the  principle  of 


locality  of  reference .  The  scheduler  nay  enlarge  the  scope  of  units 
of  allocation  so  that  a  program  unit  contains  a  related  set  of 
instructions:  that  is,  data  produced  toy  an  instruction  in  the  program 
unit  is  used  toy  other  instructions  in  the  same  unit. 

The  other  way  in  Which  generating  one  large  block,  produces  a  more 
efficient  program  than  generating  several  small  blocks  has  to  do  with 
cost  of  transmitting  streams  from  one  block  to  another.  This  cost 
occurs  whether  or  not  the  blocks  are  located  in  the  same  processor. 
On  the  Manchester  machine,  the  cost  is  manifested  in  the  work 
associated  with  generating  and  updating  token  labels. 

Each  token  generated  or  referenced  in  a  KaD  language  block  has  a 
label  which  Identifies  it  as  being  in  a  unique  block  instance .  When  a 
taken  enters  a  block,  the  old  Activation  Name  (AM)  and  Iteration  Level 
(IL)  fields  of  the  token  are  replaced  with  new  AN  and  IL  fields.  The 
new  fields  indicate  that  the  token  has  a  new  context-  the  context  of 
the  associated  block.  Variables  local  to  a  block,  that  is,  generated 
and  used  completely  within  the  scope  of  the  block,  are  also  labeled 
with  the  new  AM  and  IL.  A  token  exiting  the  block  must  have  the  old 
AM  and  IL  inserted  into  the  label  fields,  indicating  that  the  token  is 
no  longer  associated  with  the  block.  When  a  stream  of  tokens  is 
generated  within  a  block,  the  label  of  each  token  exiting  the  block 
must  be  changed.  This  is  accomplished  by  generating  two  label 
streams,  each  with  the  same  number  of  elements  as  the  data  streams. 
There  is  one  label  stream  for  the  AN  and  one  for  the  IL.  Pairs  of 


elements  (Data  and  Ml)  are  input  to  a  Set  Activation  Name  instruction 
to  update  the  AN.  Pairs  of  elements  (Data  and  IL)  are  input  to  a  Set 
Iteration  Level  instruction  to  update  the  IL.  The  cost  of  having  a 
data  stream  exit  the  block  is  that  two  label  streams  must  be 
generated,  and  the  two  fields  of  each  data  token  must  be  updated. 
Therefore  it  is  advantageous  to  detect  data  which  is  local  to  a  block. 
If  a  variable  is  local  to  a  MaD  language  block,  it  need  not  exit  the 
block.  Therefore  the  label  fields  need  not  be  restored  to  another 
context. 

Enlarging  the  scope  of  a  block  (or  program  unit)  is  accomplished 
by  merging  adjacent  components  in  the  component  graph.  Component 
merging  to  enlarge  a  block  can  be  illustrated  with  the  EXAMPLE 
specification.  Figure  6.1a  shows  a  portion  of  the  component  graph  for 
the  specification  (the  file  nodes  have  been  omitted).  Each  component 
contains  a  single  node.  Application  of  the  simple  scheduling 
algorithm  to  this  graph  results  in  a  data  flow  template  with  two  block 
description  entries  in  addition  to  the  outer  block  (the  outer  block  is 
the  block  representing  the  entire  program).  In  one  block.  Assertion  1 
is  the  sole  member,  and  X  is  the  data  node  being  defined.  In  the 
other  block.  Assertion  2  is  the  sole  member,  and  c  is  the  data  node 
being  defined.  If,  however,  adjacent  components  are  merged  to  create 
a  single  component  containing  multiple  nodes,  the  data  flow  template 
contains  only  one  block.  Figure  6.1b  illustrates  the  component  graph 
with  merged  components .  if  this  component  is  given  as  parameter  to 
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SCHElXIL5_COKPONEMT ,  the  procedure  returns  two  block  members  in  the 
data  flow  template  entry,  one  corresponding  to  Assertion  l,  and  the 
other  corresponding  to  Assertion  2.  This  block  is  a  larger  unit  of 
allocation  than  the  two  blocks  produced  from  the  component  graph 
without  merged  components.  Figure  6.2  shows  the  block  description 
entries  produced  from  the  component  graph  of  Figure  6.1b. 


Pile  Description:  (INFXLEl,  INFILE2,  OUTFILE,  INTERIM) 

Block  Description: 

Block  1:  Type-Simple  Level-0  Range-0 
Data  Modes:  None 

Block  Members:  ( ( ro_name -Block  2  m_type-block ) ) 

Block  2:  Type-Parallel  Level-l  Range-1 

Data  Nodes:  {(d_name-X  d_dim-l)  (d_name-C  d_dim-l)} 

Block  Members :  ((nonane— Assertion  1  type— assertion ) 

( m_namo -Assert ion  2  m_type-assert ion ) ) 

Figure  6.2  The  New  Template  for  EXAMPLE 

There  is  a  trade-off  involved  in  Merging  components  of  whether  to 
merge  a  component  containing  cycles  with  a  cycle- free  component.  In 
many  cases,  an  iterative  block  is  generated  from  a  cycle-containing 
component.  A  parallel  block  is  generated  from  a  cycle-free  component. 
If  the  two  types  of  components  are  merged,  the  block  generated  from 
the  resultant  component  is  iterative.  Merging  the  two  types  of 
components  enlarges  the  scope  of  the  generated  block,  which  is  a 
desirable  optimization.  However,  the  block  so  created  is  an  iterative 
block.  Potential  parallel  computation  from  the  cycle-free  component 
could  not  be  exploited  on  some  data  flow  machines.  Merging  the  two 
could,  therefore,  result  in  a  decrease  of  parallelism  in  the  data  flow 
template,  and  in  the  generated  program.  The  scheduler's  objective  is 
to  provide  as  much  information  as  is  available  at  compile  time  in  the 
data  flow  template.  Combining  an  iterative  and  parallel  block  into 
one  iterative  block  causes  information  about  parallelism  in  the 


program  to  be  lost.  I£  the  two  are  kept  distinct,  the  information  is 


still  available,  The  run  time  processor  allocation  program  cam  still 
allocate  an  iterative  block  and  a  parallel  block  to  the  same  processor 
if  a  run  time  evaluation  of  locality  issues  so  dictates.  The 
scheduler  therefore  does  not  merge  a  cycle-containing  component  (which 
might  produce  an  iterative  block)  with  a  cycle-free  component 
(potentially  a  parallel  block). 

VI. 3  DATA.  STRUCTURE  SIMPLIFICATION 

The  second  area  of  optimization  is  data  structure  simplification. 
Data  flow  machines ,  which  typically  support  the  transmission  of  data 
values  of  elementary  type,  need  special  hardware  to  handle  data 
structures.  The  support  hardware  may  take  the  form  of  an  auxiliary 
structure  controller  or  of  additional  machine  instructions  to 
manipulate  structured  data.  (See  Section  II . 3 ) .  The  data  flow 
scheduler,  therefore  attempts  to  simplify  the  data  structure  of 
variables  declared  in  the  specification.  The  scheduler  simplifies  a 
variable's  data  structure  by  reducing  the  number  of  elements  required 
in  the  generated  program  to  represent  a  node  dimension. 

A  data  node  dimension  is  defined  to  be  physical  if  the  dimension 
is  mapped  to  a  stream,  and  the  number  of  elements  in  the  stream  is 
equal  to  the  range  of  the  dimension.  A  data  node  dimension  is  defined 
to  be  virtual  if  the  dimension  is  mapped  to  a  "window"  of  elements, 
and  the  width  of  the  window  is  smaller  than  the  range  of  that 


dimension 


If  a  dimension  cam  be  recognized  to  be  virtual,  there  may 


be  considerable  savings  in  the  generated  data  flow  program.  There  is 
substantial  cost  (demonstrated  on  the  Manchester  machine)  in  creating 
and  accessing  a  stream. 

The  Mao  language  defines  two  varieties  of  streams ;  token  streams 
and  stored  streams.  A  token  stream  can  be  thought  of  as  a  number  of 
tokens  traveling  down  the  same  arc  [  Bowes  1] .  The  tokens  are 
distinguished  by  the  index  field  of  the  token  label.  A  stored  stream 
follows  more  along  the  lines  of  a  conventional  array.  The  stream  is 
stored  in  Matching  Store  and  represented  by  a  single  "context"  token, 
or  pointer.  Recognizing  that  a  dimension  of  either  type  of  stream  is 
virtual  may  produce  a  more  efficient  data  flow  program. 

The  cost  of  processing  a  token  stream  is  as  follows:  When  the 
stream  is  created,  the  index  field  of  the  label  of  each  element  must 
be  initialized.  To  select  an  element  from  the  stream,  a  new  stream 
must  be  created  to  hold  the  index  of  the  element  selected.  This  new 
stream,  which  has  as  many  elements  as  the  token  stream,  is  matched 
against  the  token  stream.  When  the  value  of  the  index  stream  matches 
the  index  of  the  data  stream,  the  element  has  been  located. 
Therefore,  it  is  advantageous  to  have  as  few  elements  as  possible  in 
the  token  stream.  If  a  data  node  dimension  cam  be  identified  as 
virtual  in  the  least  order  dimension,  then  it  may  be  possible  to 
represent  the  node  in  the  data  flow  program  as  one  or  more  scalars 


rather  than  as  a  stream. 


It  is  also  advantageous  to  find  virtual  dimensions  in  a  node 
which  is  generated  into  a  stored  stream.  A  stored  stream  occupies 
space  in  the  Matching  Store.  If  a  data  node  dimension  can  be  found  to 
be  virtual,  fewer  storage  locations  are  needed  in  the  Matching  store. 
This  is  beneficial,  since  overflow  of  the  Matching  Store  is  not 
recoverable . 

VI. 3.1  Virtual  Dimensions  For  Local  Data  Nodes 

One  example  of  a  variable  with  a  virtual  dimension  is  the  data 
node  X  in  Figure  6.2.  X  is  local  to  Block  BD1.  A  data  node  is  local 
to  a  block  if  the  data  node  is  produced  by  an  assertion  in  the  block, 
and  the  node  is  used  only  by  assertions  within  the  block.  Since  X  is 
produced  by  Assertion  1,  which  a  member  of  BD1,  and  X  is  used  by 
Assertion  2,  also  a  member  of  BD1,  X  is  local  to  BDl.  There  are  as 
many  instances  of  Block  BDl  as  the  range  of  I.  Only  the  instance  of  X 
corresponding  to  the  block  instance  is  needed  in  the  block  instance. 
Only  X( 7)  is  needed  in  the  seventh  instance  of  BDl.  Therefore  the 
dimension  of  X  associated  with  the  range  set  of  I  can  be  marked 
virtual.  In  the  data  flow  program  which  is  generated  from  the 
template,  X  can  be  declared  as  a  scalar  local  to  BDl. 


VI. 3. 2  Virtual  Dimensions  in  Iterative  Blocks 

Another  example  in  which  a  data  node  dimension  is  virtual  is 
within  an  iterative  block.  If  a  data  node  is  defined  by  a  recurrence 
relation,  that  is,  each  element  of  the  array  is  defined  in  terms  of 
elements  of  lower  index,  and  only  the  last  element  of  the  array  is 
used  in  other  assertions,  then  the  dimension  corresponding  to  the 
recurrence  may  be  marked  virtual.  The  Factorial  example  illustrates 
this  situation: 

Assertion  3:  FACTORIALS)  -  IP  I  -  1  THEN  1; 

ELSE  I  *  FACTORIAL! 1-1 )j 

Assertion  4:  OOTT  -  FACTORIAL! SIZE. FACTORIAL); 

Only  the  final  value  of  the  factorial  iteration  is  needed  to  define 
the  value  of  OOTT.  Therefore,  the  dimension  may  be  marked  virtual. 
In  this  case,  the  subscript  expression  in  Assertion  3  is  Type  2, 
"1-1" .  This  indicates  that  only  two  successive  elements  of  the  array 
are  required  at  any  one  time.  Therefore,  a  "window"  of  2  array 
elements  is  required.  In  general,  a  window  of  k+1  elements  is 
required  for  "I-k"  subscript  expressions. 


VI.  4  EXPERIENCE  WITH  THE  MANCHESTER  MACHINE 


Experiments  done  as  a  part  of  this  work  on  the  Manchester  data 
flow  machine  confirm  that  block  enlargement  and  data  structure 
simplification  result  in  more  efficient  data  flow  programs.  One 
series  of  experiments  were  performed  with  specification  EXAMPLE.  The 
data  flow  template  based  on  Figure  5 . 4  was  translated  to  Mao  and 
compiled.  The  execution  run  resulted  in  the  following  run  time 
statistics : 


TOTAL  NUMBER  OF  INSTRUCTIONS  EXECUTED  :  SI  -  624 

ASSUMING  :  1. UNLIMITED  NO. OP  PROCESSORS 

2. ALL  INSTRUCTION  EXECUTION  TIMES  -  1  STEP 
TOTAL  NUMBER  OF  PROCESSING  STEPS  :  SINF  -  148 

AVERAGE  PARALLELISM  OF  THE  PROGRAM  :  SI/ SINF  -  4 

9  RESULT  TOKENS  WRITTEN 
865  TOKENS  PASSING  THROUGH  THE  RESULT  QUEUE 


Then  the  template  from  Figure  6.2  was  translated  to  KaD,  compiled,  and 
run  on  the  emulator.  In  this  template,  the  block  was  enlarged  to 
include  both  assertions;  X  was  recognized  to  be  a  local  data  node; 
and  dimension  1  of  X  was  marked  virtual.  The  following  results  were 
obtained  from  running  the  second  version  of  the  program: 


TOTAL  NUMBER  OP  INSTRUCTIONS  EXECUTED  :  SI  446 

ASSUMING  :  1. UNLIMITED  NO. OF  PROCESSORS 

2. ALL  INSTRUCTION  EXECUTION  TIMES  -  1  STEP 
TOTAL  NUMBER  OF  PROCESSING  STEPS  :  SINF  -  133 

AVERAGE  PARALLELISM  OF  THE  PROGRAM  :  SI/ SINF  -  3 

9  RESULT  TOKENS  WRITTEN 
616  TOKENS  PASSING  THROUGH  THE  RESULT  QUEUE 


The  second  program  executed  178  fewer  Instructions  (a  28%  improvement) 
and  completed  in  IS  fewer  processing  steps  ( a  10%  improvement ) .  249 

fewer  tokens  passed  through  the  token  queue  ( a  28%  improvement ) . 
Enlarging  the  block  and  simplifying  the  data  structure  of  X  resulted 
in  a  computation  which  completed  faster  (fewer  processing  steps), 
required  that  fewer  instructions  be  executed,  and  generated  fewer 
tokens  to  flow  through  the  ring.  Similar  results  were  obtained  with 
the  Factorial  program. 

In  the  following  section,  the  algorithms  used  to  achieve  the  two 
optimizations  discussed  here,  block  enlargement  and  data  structure 
simplification,  are  described. 

VI. 5  THE  MODIFIED  SCHEDULING  ALGORITHM 

The  SCHEDULE  procedure  described  in  the  simple  scheduling 
algorithm  is  also  used  in  the  new  scheduling  algorithm. 
Initialization  is  carried  out  as  in  the  simple  algorithm.  SCHEDULE 
calls  SCHEDULELGRAPH  with 

1)  the  component  graph  representing  the  entire  specification, 

2)  a  block  description  entry  for  the  outermost  block,  and 

3)  a  nesting  level  of  1 

as  call  parameters .  A  new  version  of  the  procedure  SCHEDULE_GRAPH 
incorporates  the  algorithms  to  do  block  enlargement  and  data  structure 


simplification . 


VI. 5.1  Criteria  For  Merging  Components 


SCHEDULE_GRAPH  attempts  to  enlarge  the  scope  o£  blocks  in  the 
template  by  merging  adjacent  components  of  the  component  graph. 
Components  C  and  C*  are  said  to  be  adjacent  if  there  is  am  edge  C->C' 
or  an  edge  C'->C.  Let  G  be  the  component  graph.  Each  member  c  of  G 
is  an  K5CC.  If  |C|  >  1,  then  the  component  graph  contains  a  cycle. 
There  is  a  path  from  any  node  in  C  to  every  other  node  in  C.  Por 
example,  the  array  graph  for  Assertion  3  above  is  shown  in  Pigure 
6.3a.  The  graph  contains  a  cycle  because  Factorial  is  both  source  to 
the  assertion  and  a  target  of  the  assertion.  Since  a  node  represents 
an  entire  array  instead  of  an  array  element,  there  appears  to  be  a 
cyclic  dependency.  Further  analysis  of  the  subscript  expressions 
reveals  that  the  cyclic  dependency  does  not  exist  in  the  Underlying 
Graph,  as  illustrated  in  Figure  6.3b.  The  program  generated  from  this 
specification  should  compute  the  array  iteratively  from  index  1  to  the 
range  of  I.  In  many  cases,  a  cycle  in  the  array  graph  indicates  the 
possibility  of  iterative  computation,  a  computation  in  which  the  value 
of  an  array  element  depends  on  the  values  of  elements  of  the  array  of 
lower  indices.  Since  a  cycle-containing  component  is  not  to  be  merged 
with  other  components,  only  components  c  with  JCJ  =  l  are  candidates 


for  merger. 


Components  are  merged  by  adding  new  edges  to  the  component  graph. 
I£  Cl  and  C2  are  two  components  such  that 

1.  |C1|  -  jC2 |  -  1 

2.  There  is  an  edge  from  N2  in  C2  to  Ml  in  Cl  (M2  ->  Ml) 

3.  Adding  the  edge  does  not  cause  an  iterative  computation  to  be 
merged  with  a  parallel  computation 

then  a  new  edge  is  added  from  N1  to  N2  (Ml  ->  N2).  This  will  cause  Ml 
and  M2  to  be  in  the  same  MSCC.  There  is  an  edge  N2  ->  N1  in  the 
original  graph.  When  a  new  edge  Ml  ->  M2  is  added,  a  cycle  is  formed. 
As  successive  pairs  of  single  node  components  are  considered,  the 
cycle  may  be  enlarged,  enlarging  the  number  of  elements  in  the 
component  and,  therefore,  the  number  of  members  in  the  parallel  block 
generated  from  the  component . 

It  should  be  noted  that  a  distinction  is  made  between  MSCC's  in 
the  original  array  graph  and  MSCC's  caused  by  new  edges  being  added  to 
the  graph  to  merge  single  node  components.  KSCC's  in  the  original 
graph  reflect  data  dependencies  of  the  specification.  These  MSCC's 
usually  indicate  iterative  computation  of  successive  array  elements. 
The  MSCC's  caused  by  new  edges  added  to  the  graph  are  components  from 
which  parallel  blocks  are  generated. 

Condition  3  ensures  that  each  time  an  edge  is  added  to  the  graph, 
addition  of  the  edge  does  not  cause  the  new  MSCC  being  constructed  to 
be  merged  with  an  MSCC  of  the  original  graph.  Addition  of  the  edge 
under  such  circumstances  would  cause  an  iterative  block  and  a  parallel 


block  to  be  verged  into  one  iterative  block.  As  discussed  above, 
iterative  blocks  are  not  merged  with  parallel  blocks,  even  though 
doing  so  increases  the  number  of  elements  in  the  block.  It  is 
considered  more  important  to  retain  the  distinction  between  iterative 
and  parallel  blocks  in  the  generated  program  than  to  increase  the 
number  of  elements  in  an  iterative  block. 

let  Ml  and  N2  be  the  single  nodes  in  Cl  and  C 2  respectively  (  jClj 
»  | C2 |  —  1)  such  that  there  is  a  path  from  Ml  to  N2  in  the  component 
graph  (Ml  ->+  N2)  Which  does  not  go  through  a  node  in  a  multi-node 
MSCC  from  the  original  graph.  Let  M  be  any  MSCC  in  the  original 
graph.  Consider  the  following  three  relationships  which  could  hold 
between  {Nl,  M2}  and  M. 

1.  There  is  an  edge  from  M  to  Nl  and  an  edge  from  M  to  M2. 

2.  There  are  edges  from  both  Nl  and  M2  to  M. 

3.  There  is  an  edge  from  Nl  to  M  and  from  M  to  M2. 

Figures  6.4a,  6 .4b,  and  6.4c  illustrate  these  relationships.  The 
fourth  relationship,  an  edge  from  N2  to  M,  and  an  edge  from  M  to  Nl 
could  not  occur  in  the  component  graph.  If  it  did,  there  would  be  a 
path  from  M2  back  to  itself  and  from  Nl  back  to  itself,  and  M  would 
contain  these  nodes  in  the  first  place . 


X  QURE  S.4R  EDGES  FROM  M9CC  TO  SINGL 


CURE 
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Next,  consider  the  effect  on  the  component  graph  of  adding  edges 
so  that  for  each  edge  in  the  chain  from  N1  ->+  N2,  a  new  edge  is  added 
in  the  reverse  direction.  This  creates  a  new  chain  front  N2  ->+  Ml. 
If  Case  1  above  applies,  then  adding  such  a  chain  will  not  result  in  a 
cycle  between  the  new  MSCC  containing  {Nl,  N2>  and  M.  New  edges  are 
only  added  between  nodes  in  single  node  components,  so  a  new  edge 
cannot  be  added  from  Nl  to  M.  Similarly  for  Case  2,  creating  an  MSCC 
containing  (Nl,  N2)  does  not  cause  a  cycle  between  the  new  MSCC  and  M. 

However,  Case  3  poses  a  problem.  If  a  chain  is  added  back  from 
N2  to  Nl,  then  since  there  is  already  an  edge  from  Nl  to  M,  a  new  path 
is  created  from  Nl  back  to  itself,  and  from  N2  back  to  itself.  A  new 
MSCC  is  created  which  contains  both  M  and  the  chain  Nl  ->+  N2.  An 
MSCC  which  is  to  be  part  of  a  parallel  block  is  merged  with  an  MSCC 
for  an  iterative  block.  Omitting  any  one  edge  in  the  chain  from  N2 
back  to  Nl  is  sufficient  to  prevent  this  condition.  The  scheduler 
(arbitrarily)  omits  the  edge  from  the  successor  of  Nl  back  to  Nl. 

VI. 5. 2  Adding  Edges  To  The  Component  Graph 

SCHEDULE_GRAPH  first  finds  the  MSCC ' s  of  the  component  graph,  as 
in  the  simple  scheduling  algorithm.  The  procedure  then  scans  each 
component,  starting  with  components  with  no  successors  and  ending  with 
components  with  no  predecessors.  Let  NO  be  a  node  of  a  component  C  of 
G,  where  |C|  -  1.  The  scheduler  forms  the  predecessor  set  of  C,  P(C). 
P(C)  consists  of  all  other  components  C2  of  G  such  that 


2.  There  xs  an  edge  in  the  array  graph  £rom  the  node  N2  in  C2  to  the 
node  N  in  C 

3.  Adding  an  edge  to  the  component  graph  N  ->  N2  will  not  cause  N  and 
N2  to  become  part  of  one  of  the  original  multi-node  MSCC's. 

Prom  P(C ),  the  scheduler  forms  a  set  of  candidate  components  1(C) 
for  inclusion  with  C  in  the  component  being  formed.  A  predecessor  C2 
is  added  to  1(C)  if  all  edges  from  the  node  in  C2,  N2,  to  the  node  in 
C,  N,  have  a  Type  1  subscript  expression  from  one  or  more  unscheduled 
dimensions  D2i  in  N2  to  the  corresponding  unscheduled  dimensions  Di  in 
N.  In  addition,  each  pair  Oi  and  D2i  must  belong  to  the  same  range 
set  R.  C2  is  said  to  be  a  candidate  for  inclusion  in  a  common 
component  with  C  for  the  range  R. 

A  node  has  zero  or  sore  dimensions.  Each  dimension  belongs  to  a 
range  set.  bet  the  set  of  range  sets  associated  with  a  node  N  be 
called  RN.  bet  RC  refer  to  the  set  of  range  sets  associated  with  a 
component  c.  In  general, 

RC  -  Onion(RNi),  where  Ni  is  in  C. 

In  our  case,  RC  —  RN  when  N  is  in  C,  because  we  only  consider 
components  C  with  icj-l.  Each  member  of  1(C),  cj,  has  a  set  of  range 
sets  RCj  associated  with  Cj.  bet 
RCi  -  Inter sect ion(RCj,RC). 

That  is,  RCI  contains  those  range  sets  which  are  in  both  RCj  and  in 
RC.  There  may  be  cases  in  which  there  exists  a  range  set  R1  such  that 


R1  xn  RCx;  R1  not  xn  RCi 
Ci,  Cj  in  1(C). 

Foe  example. 

Assertion  4s  A(I,J,K)  -  X(I,K)  +  Y(K,J) 

Let  RX  and  RY  refer  to  the  set  of  range  sets  associated  with  the 
components  containing  X  and-  Y  respectively.  Let  C  be  the  component 
containing  Assertion  4.  Then  RX  contains  the  range  sets  of  I  and  K 
( (I,K) ),  and  RY  contains  the  range  sets  of  J  and  K  ({J,K}).  RX  and  RY 
overlap.  The  scheduler  must  resolve  this  overlap.  Assertion  4  cannot 
be  in  a  cannon  component  with  X  for  the  range  of  I  and  at  the  sane 
time  in  a  common  component  with  Y  for  the  range  of  J.  Doing  so  would 
place  Y  in  a  common  component  with  X  for  the  range  of  I.  Since  Y  does 
not  have  a  dimension  whose  range  set  is  the  range  of  I,  it  would  be 
impossible  to  schedule  Y  in  the  block  with  range  I. 

This  overlap  occurs  when 
Intersection  RCj  )  <>  CJnion(  RCj  ) . 

To  resolve  the  situation,  the  scheduler  computes  Intersection  Re j ) . 
It  then  retains  in  1(C)  those  components  Cl  such  that 
Intersection! RCi )  is  contained  in  RC1. 
and  retains  as  the  coonon  ranges  only  those  ranges  in 

Intersection  Ci ) .  For  the  example  above,  the  common  set  of  range  sets 
is  (K),  and  the  candidates  for  inclusion  with  C  in  a  common  component 


for  the  range  of  K  are  (X,Y). 

the  scheduler  then  adds  edges  to  the  component  graph.  Adding 
edges  changes  the  connectivity  properties  of  the  graph.  In  particular 
adding  edges  causes  new  multi-node  Msec's  to  appear  in  the  component 
graph.  Parallel  blocks  in  the  data  flow  template  are  generated  from 
these  new  multi-node  MSCC's. 

For  each  component  C2  in  the  inclusion  set  1(C),  the  scheduler 
adds  an  edge  from  N  in  C  to  N2  in  C2.  The  edge  type  indicates  that 
this  is  a  "back”  edge,  that  is,  an  edge  back  from  the  target  of  an 
existing  edge  to  the  source  of  the  edge.  Adding  this  edge  creates  a 
new  cycle  in  G.  The  subscript  expressions  for  the  dimensions  in 
common  are  marked  as  Type  2  (1-1),  so  that  they  can  be  recognized  in 
later  processing  as  a  pseudo  rather  than  true  cyclic  dependency. 

Once  the  new  edges  are  added,  the  graph  can  be  decomposed  once 
more  into  MSCC's.  If  back  edges  were  added,  then  new  MSCC's  will  have 
been  formed.  Each  component  is  now  given  as  parameter  to 
SCHEDULE_COMPONENT.  This  procedure  is  the  same  as  described  for  the 
simple  scheduling  algorithm,  with  two  exceptions .  SCHEDULE_COMPONENT 
now  recognizes  "back”  edges  and  creates  a  parallel  block  from  a 
component  which  contains  these  edges. 

The  other  Change  to  SCHEDULE_COMPONENT  is  that  data  structure 
simplification,  the  second  optimization  goal,  is  also  performed  in 
this  procedure,  once  a  new  block  has  been  created  (Case  2  of 
SCHEO(JLE_COMPOMEtrr ) ,  data  nodes  in  the  component  from  which  the  block 
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is  formed  are  examined  to  locate  virtual  subscript  dimensions. 

The  component  graph  for  EXAMPLE  obtained  from  the  new 

SCHEtXJLE_GRAPH  is  as  follows: 

Component  1  has  nodes:  OtfTFILE 

Component  2  has  nodes:  INTERIM. XD 

Component  3  has  nodes:  AASS220  INFILE1 . A, INFILE1 . INREC1, 

INFILE2 . B, INFILE2 . INREC2 , INTERIM. X, 

AASS230 , OUTFILE . C , OUTFILE . OUTREC 
Component  4  has  nodes:  INPILE2 

Component  S  has  nodes:  INPILE1 

Component  6  has  nodes:  EXAMPLE 

VI. 5. 3  The  Revised  SCHEDULE_GRAPH 

The  component  enlargement  analysis  is  performed  in 
SCHEDULELjGRAPH.  The  revised  SCHEDCJLE_GRAPH  performs  the  following 
stepe: 

1.  Input  to  the  procedure  is  a  component  graph.  The  format  of  the 
component  graph  is  as  described  in  Chapter  5. 

2.  Build  a  list  of  predecessors  for  each  component.  If  there  is  an 
edge  E  from  Component  C2  to  Component  C,  then  C2  is  a  member  of 
CPREDS(C),  where  CPREDS(i)  consists  of  the  predecessors  of  Component 
i. 

Remove  from  CP REDS  components  for  whom  adding  an  edge  back  from  c  to 
C2  would  cause  the  new  KSCC  to  be  merged  with  an  MSCC  in  the  original 


component  graph. 


A  function  OKMscc  checks  whether  a  member  of  CPREDS  should  be  removed. 
OKMscc  is  called  with  the  two  nodes  M2  in  C2  and  M  in  C  3uch  that 
there  is  an  edge  in  the  array  graph  from  N2  to  N. 

OKMscc  does  the  following: 

Assuming  that  an  edge  will  be  inserted  from  N  in  C  to  N2  in  C2, 
construct  the  MSCC  M  containing  M  and  N2.  If  any  member  of  M  was  a 
member  of  an  MSCC  from  the  original  component  graph,  then  return 
false.  The  back  edge  should  not  be  inserted,  otherwise,  return  true: 
it  is  safe  to  insert  the  back  edge. 

If  OKMscc  returns  false  for  some  predecessor  M2  of  N,  M2  is  removed 
from  CPREDS. 

Steps  3-6  are  performed  for  each  single-node  component  in  the 
component  graph. 

3.  Build  the  Candidate  data  structure.  This  involves  locating  those 
predecessors  of  the  component  which  axe  candidates  for  inclusion  in 
the  sane  parallel  block  as  the  current  component. 

01  Candidate(ndim)  based  ( p_candidate ) , 

02  range_num  fixed  bin,  /*  range  set  number 

for  each  node  dimension  */ 

02  one_cand  (ccwp_cnt)  bit(l);  /*  •  l'b  -> 

the  comp  is  a  candidate  */ 


A  predecessor  is  added  to  Candidate  if  1)  the  predecessor  forms  a 


single  node  MSCC,  and  2)  adding  an  edge  from  C  to  the  predecessor  will 
not  cause  a  merger  of  the  new  component  with  a  multi-node  component 
from  the  original  component  graph.  ndim  is  the  number  of  node 
dimensions,  range _ man  is  the  range  set  number  for  the  dimension. 
one_cand  is  a  bit  vector.  A  true  value  for  an  element  of  one  cand 
indicates  that  the  indexed  component  is  a  candidate  for  inclusion  in  a 
parallel  block  with  the  current  component  for  the  indicated  range. 
The  Candidate  structure  is  built  as  follows: 


For  each  dimension  I  of  N  in  C,  the  node  being  processed,  do 
(  for  each  member  Ci  of  CP REDS  do 

{  Let  Ni  be  the  name  of  the  node  in  Ci. 

Locate  a  dimension  1*  in  Ni  with  the 
same  range  as  I,  R(I). 

If  such  a  dimension  should  exist  then  do 
{  Look  at  the  edges  from  Ni  to  NO. 

If  all  edges  Ei  have  a  Type  1  subscript  expression 
at  the  I'  dimension  then  add  Ci  to 
Candidate  at  the  dimension  I. 

} 

} 

) 


4.  Porm  the  intersection  of  the  candidate  ranges.  The  intersection 
data  structure  consists  of  two  parts. 


del  01  intersec, 

02  i_ranges  (ndim)  fixed  bin, 
02  i_sec  (comp_cnt)  bit(l); 


This  data  structure  is  built  as  follows: 


3 


(For  i  ranging  from  l  to  ndim, 

(for  j  ranging  from  1  to  comp_cnt , 

if  candidate( i ) .one_cand( 3 }  =  •  i*b  then 
if,  for  all  1c  ranging  from  1  to  ndim, 
candidate( k ) . one_cand(  j )  -  *  1 • b 
then 

(set  i_ranges(  i )«candidate( i ) . range_num; 
set  i_sec( j )» *  1  * b; 

} 

else  set  i_ranges( i)-0; 

) 

) 


5.  Check  to  see  whether  any  members  of  intersec  must  be  discarded 
because  of  the  partial  order  relation  over  range  sets.  Suppose  there 
is  an  edge  from  Ni  to  N,  where  Ni  is  in  Ci  and  Ci  is  a  member  of 
intersec,  and  where  the  edge  contains  a  Type  l  subscript  expression 
for  dimension  I*  of  Ni  corresponding  to  I  of  N.  Now  suppose  that  I" 
is  another  dimension  of  Ni  and  the  range  set  corresponding  to  I**, 
R(  I” ),  precedes  the  range  set  corresponding  to  I',  R(I').  That  is, 
suppose  R(  I” )  must  be  defined  before  R(I*  )  can  be  known.  This  would 
occur  if  the  SIZE  or  END  qualifiers  were  used  to  define  I',  and  I" 
were  an  argument  to  the  SIZE  or  END  expression.  If  I”  cannot  be 
scheduled  before  I,  then  Ci  must  be  removed  from  intersec  for  the 
dimension  I.  This  information  is  gathered  by  examining  the  ralp  field 
of  the  LOCALi_SUB  corresponding  to  I',  ralp  points  to  a  list  of  other 
dimensions  of  the  node  Ni  which  must  precede  the  dimension  I ' .  The 
following  check  is  performed  for  each  entry  in  the  ralp  list.  The 
dimension  referenced  in  the  ralp  entry  precedes  ( in  the  partial  order 


on  range  sets)  the  dimension  being  processed  . 


Por  each  “ralp"  entry  in  the  list,  do  the  following: 

{  If  the  dimension  In  the  ralp  entry  has  been 
scheduled  then  continue. 

If  the  dimension  has  not  been  scheduled,  and  the 
range  corresponding  to  that  dimension  is  not  a 
meeker  of  intersec,  then  discard  the  node  and  exit. 

) 

If  all  dimensions  in  the  ralp  list  pass, 
then  keep  the  node  in  the  intersection  set. 


6.  Por  each  candidate  Which  is  left  in  intersec,  insert  a  new  edge 
from  the  node  N  to  Ni,  the  single  node  in  Ci.  This  "back"  edge  will 
cause  an  KSCC  to  be  formed  whict  will  include  nodes  N  and  Ni.  The 
MSCC  will  be  the  basis  for  a  parallel  block. 


7.  Once  back  edges  have  been  inserted  for  each  eligible  node,  the 
array  graph  is  again  divided  into  KSOC's.  Each  MSCC  is  then  submitted 
to  the  procedure  SCHEDULE_COMPONENT. 


8.  SCHEDULE_COMPONENT  adds  a  member  m  to  the  current  block 
description  entry.  If  M  is  of  type  "block”,  SCHEDULE.COMPONENT  calls 
PindVirtual(M)  (described  below)  to  mark  virtual  dimensions  of  data 


nodes  defined  in  M. 


9.  When  all  components  have  been  scheduled,  SCHEDULE. GRAPH  return  the 


composite  schedule. 


VI. 5. 4  Locating  Virtual  Dimensions 

Hie  procedure  PindVirtual  is  called  by  SCHEDULE_component  to 
locate  virtual  dimensions  of  data  nodes.  Input  to  PindVirtual  is  the 
block  description  entry  B  of  the  block  member  constructed  by 
SCHEDCTLE_CX)MPONENT .  Let  R  be  the  range  for  B.  PindVirtual  performs 
two  functions. 

1.  If  the  block  type  of  B  is  iterative,  look  for  virtual  dimensions 
of  data  nodes  defined  in  B. 

Por  each  data  node  defined  in  B,  let  D  be  the  name  of  the  data  node. 
Mark  the  dimension  of  D  corresponding  to  R  as  virtual  if  each  edge 
from  D  (source)  to  an  assertion  (target)  is  in  the  following  form: 

la.  The  edge  has  a  subscript  expression  of  Type  1,  2  or  3  in  the 
distinguished  dimension  and  the  target  is  in  B, 

and,  optionally, 

lb.  The  edge  has  a  Type  4  subscript  expression  in  the 
distinguished  dimension,  and  the  subscript  expression  is  SIZE. <name 
of  D> .  This  edge  indicates  that  the  target  depends  only  on  the 
last  element  of  D. 

2.  Construct  a  table.  Local,  of  local  data  nodes.  Each  entry  of  the 
table  has  two  fields:  Node_id,  the  data  node  id;  and  Blocks ix,  the 
index  of  the  block  to  Which  Node_id  is  local. 

Por  each  data  node  defined  in  B,  let  D  be  the  name  of  the  data  node. 
D  is  a  local  data  node  if  for  each  edge  from  D  to  an  assertion,  the 
assertion  is  also  a  member  of  B.  If  D  is  found  to  be  a  local  data 


node,  then  do  the  following: 

If  every  dimension  of  D  has  been  scheduled,  and  each  edge  from  D  to  an 
assertion  has  a  Type  1  subscript  expression  in  every  dimension,  then 
add  (D,  Block  Number  of  B)  to  Local. 

VI. 6  CONCLUSION 

This  concludes  the  discussion  of  efficiency  considerations  in 
scheduling  for  a  data  flow  machine.  We  have  described  algorithms 
Which  enlarge  the  scope  of  generated  blocks  and  Which  simplify  the 
structure  of  data.  The  topic  of  the  next  chapter  is  code  generation, 
the  problem  of  generating  from  the  data  flow  template  a  program  in  the 


MaD  language. 


parameters  and  their  types.  Next  is  the  data  type  of  the  result 
returned  by  the  program.  After  the  program  header,  the  variables  used 
in  the  program  and  their  data  types  are  listed  in  the  global 
declarations  section  of  the  program.  In  this  section,  the  variables 
used  to  hold  the  value  returned  by  the  program  (called  the  output 
parameters  in  the  following  discussion);  interim  variables;  and  any 
special  variables  used  in  the  specification  (such  as  END  or  SIZE 
variables)  are  declared.  After  the  data  declarations  come  the 
"assignment"  statements.  These  statements  define  the  values  of  the 
variables.  An  assignment  statement  may  have  as  the  right  hand  side  a 
simple  arithmetic  expression  or  a  more  complicated  MaD  block.  ‘Ate 
assignment  statements  are  generated  systematically  from  the  block 
description  section  of  the  template.  If  a  member  of  the  block  being 
processed  is  an  assertion,  an  assignment  statement  of  the  simple  sort 
is  generated.  If  the  block  member  is  itself  of  type  block,  an 
assignment  statement  with  a  right  hand  side  of  consisting  of  a  KaD 
block  is  generated.  A  block  member  of  type  block  is  then  processed 
recursively.  In  this  way,  blocks  nested  to  an  arbitrary  level  may  be 
generated.  The  final  section  of  the  MaD  program  generated  by  the 
Model  Processor  is  the  return  statement.  A  MaD  program  must  return  a 
value  upon  termination.  The  value  returned  may  be  simple  or 
composite.  The  return  value  is  composed  of  the  values  computed  by  the 
assignment  statements  for  the  output  parameters  of  the  program.  The 
form  of  the  generated  program  is  shown  in  Pigure  7.1. 
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I.  Dec  lazat ions 

A.  The  Program  Header 

1.  Program  Name 

2.  Input  Parameter  Names  and  Types 

3.  Result  Type 

B.  Global  Declarations 

1.  Interim  and  Special  Variable  Names  and  Types 

2.  Output  Parameter  Names  and  Types 

II.  Assignment  Statements 

A.  Simple  Assignments 

B.  Nested  Blocks 

III.  Return  Statement 


Pigure  7.1  The  Generated  Program  Structure 


To  demonstrate  the  correspondence  between  data  flow  template  and 
MaO  program,  the  structure  of  the  KaD  program  generated  for  the 
EXAMPLE  template  (Pigure  6.2)  is  shown  in  Pigure  7.2.  The  EXAMPLE 
template  contains  the  following  information: 

1)  The  file  description  section  of  the  template  contains  two  input 
files  containing  repeating  fields  A  and  B  respectively,  one  output 
file  containing  repeating  field  C,  and  an  interim  file  containing  the 
repeating  field  x. 

2)  There  are  two  blocks  in  the  block  description  section  of  the 
template.  The  first  is  the  outer  block,  representing  the  entire 
program.  The  second  block  is  nested  in  the  outer  block.  The  two 


assertions  are  nested  within  the  second  block. 


I.  Declarations 

A.  The  Program  Header 

1.  Program  Name:  EXAMPLE 

2.  Input  Parameter  Names  and  Types: 

Integer  streams  A  and  b 

3.  Result  Type:  Integer  stream 

B.  Global  Declarations 

1.  Interim  Variable  Names  and  Types: 

None  (since  X  is  local  to 
a  nested  block) 

2.  Output  Parameter  Names  and  Types:  Integer  stream  C 

II.  Assignment  statements 

A.  Simple  Assignments :  None 

B.  Nested  Blocks:  There  is  one  nested  block. 

It  contains  assignment  statements  for  X  and  C. 

III.  Return  Statement: 

the  value  of  C  is  returned  as  the  program  result. 


Figure  7.2  structure  of  the  EXAMPLE  Program 

the  following  sections  describe  the  algorithms  used  to  transform 
the  data  flow  template  into  a  MaD  language  program.  The  next  section 
describes  limitations  of  the  MaD  language  which  are  more  restrictive 
than  the  Model  language.  Then,  an  overview  of  the  code  generation 
phase  is  presented.  Each  part  of  code  generation  is  discussed: 
generating  the  data  declarations,  the  assignment  statements,  and  the 


return  statement. 


VII. 2  RESTRICTIONS  OF  THE  KAO  LANGUAGE 


KaD  has  several  limitations  which  are  more  restrictive  than  those 
of  the  Hod  el  System.  MaD  permits  dimensions  of  a  multi-dimensional 
structure  to  be  constructed  only  in  a  hierarchical  sequence.  The 
least  significant  dimension  must  be  defined  first,  followed  by  the 
next  least  significant  dimension,  and  so  on.  If  M(diml,  dim2)  is  a 
matrix,  the  restriction  dictates  that  every  element  of  the  first  row 
of  M  must  be  defined  before  an  element  of  the  second  row  is  defined. 
This  restricts  the  flexibility  of  algorithm  construction  rather  than 
the  scope  of  algorithm  which  can  be  defined  in  MaD.  The  Model  data 
flow  scheduler  chooses  any  unscheduled  dimension  of  a  component  from 
which  to  construct  a  block.  The  choice  may  be  limited  by  precedence 
relations  among  range  sets.  However,  if  a  selection  is  not  limited  by 
such  precedence  relationships,  the  scheduler  does  not  restrict  the 
nested  block  structure  to  be  a  hierarchical  definition  of  node 
dimensions  as  does  MaD.  It  does  not  require  that  a  node’s  dimension 
be  defined  from  least  significant  to  most.  Therefore,  some  valid  data 
flow  templates  produced  by  the  scheduler  may  not  be  translatable  to 
Mao. 

A  second  restriction  is  that  the  input  and  output  parameters  of  a 
program  may  not  be  of  generalized  tree  structure.  There  can  be  only 
one  leaf  node  in  the  tree  structure  for  the  parameter.  Thus,  although 
a  parameter  may  be  multi-dimensional,  it  must  be  of  elementary  base 


type.  Because  of  this  restriction.  Model  specifications  which  require 


input  and  output  data  formatted  as  generalized  trees  cannot  be 
translated  into  MaD. 

The  third  limitation  of  MaD  which  affects  translation  is  that  a 
complete  dimension  of  a  multi-dimensional  structure  must  be  defined  in 
one  expression.  Individual  elements  may  not  be  defined  separately. 
For  example,  if  A  is  a  one-dimensional  array,  a  definition  of  the  form 
**A( 6 )  15"  is  not  permitted.  Instead,  the  definition  must  take  the 
form  "A  <expression>".  Where  the  <expression>  evaluates  to  a 
stream,  the  MaD  equivalent  of  a  Model  one-dimensional  array.  Because 
of  this  restriction.  Model  assertions  with  generalized  subscript 
expressions  on  the  left  hand  side  cannot  be  translated  into  MaD. 

this  restriction  also  means  that  an  entire  record  or  group  must 
be  defined  by  a  single  expression.  A  data  flow  template  in  which  all 
the  components  of  a  record  or  group  are  not  defined  within  the  same 
block  is  not  directly  translatable  to  MaD.  Therefore,  structured 
interim  data  is  transformed  to  an  equivalent  but  simpler  form  in  the 
generated  MaD  program.  The  simple  tree  structure  used  for  interim 
variables  is  the  same  as  the  structure  required  of  input  and  output 
parameters  in  MaD  (described  above). 

VII. 3  ORGANIZATION  OP  THE  CODE  GENERATION  PHASE 

The  procedure  Codegen  in  the  Model  Processor  is  responsible  for 
generating  a  MaD  program  from  the  data  flow  template.  Data  structures 
used  by  Codegen  include  the  data  flow  template,  the  table  Local 


created  by  the  scheduler,  and  the  node  attribute  table  or  dictionary 


created  prior  to  scheduling.  Codegen  calls  on  three  auxilliary 
procedures  to  generate  the  data  declarations,  the  assignment 
I  statements,  and  the  return  statement.  The  procedure  GenOcl  handles 

the  declarations.  Input  to  Gendcl  are  1)  the  data  description  section 
o£  the  data  flow  template,  2)  the  table  Local,  and  3)  the  node 
!  attribute  table.  Gendcl  produces  all  the  global  declarations.  The 

procedure  GenBlk  generates  the  assignment  statements,  both  simple 
statements  and  statements  which  contain  nested  blocks.  Input  to 
I  GenblX  are  1)  the  block  description  of  the  data  flow  template,  2)  the 

table  Local,  and  3)  the  node  attribute  table.  GenBlk  uses  several 
procedures  to  generate  parts  of  the  assignment  statements.  Procedure 
i  GenAssr  generates  a  simple  assignment  statement  from  an  assertion. 

Procedure  Locaivar  generates  local  variable  declarations  in  nested 
blocks.  Procedure  ForEach  generates  the  body  of  a  parallel  block. 

|  Procedure  Iter  generates  the  body  of  an  iterative  block.  The 

procedure  Ret,  called  by  GenBlk,  ForEach,  and  iter,  generates  a  Return 
statement . 

I 

VI I. 4  GENERATING  DATA  DECLARATIONS 

Declarations  are  generated  for  the  program  header,  the  interim 

I 

variables,  and  the  output  parameters.  The  program  header  is  generated 


first . 


created  toy  the  scheduler,  and  the  node  attribute  table  or  dictionary 


created  prior  to  scheduling.  Codegen  calls  on  three  auxilliary 
procedures  to  generate  the  data  declarations,  the  assignment 
statements,  and  the  return  statement.  The  procedure  GenDcl  handles 
the  declarations.  Input  to  Gendcl  are  l)  the  data  description  section 
of  the  data  flow  template,  2)  the  table  Local,  and  3)  the  node 
attribute  table.  Gendcl  produces  all  the  global  declarations.  The 
procedure  GenBlk  generates  the  assignment  statements,  both  simple 
statements  and  statements  which  contain  nested  blocks.  Input  to 
Genblk  are  1)  the  block  description  of  the  data  flow  template,  2)  the 
table  Local,  and  3)  the  node  attribute  table.  GenBlk  uses  several 
procedures  to  generate  parts  of  the  assignment  statements.  Procedure 
GenAssr  generates  a  simple  assignment  statement  from  an  assertion. 
Procedure  Local Var  generates  local  variable  declarations  in  nested 
blocks.  Procedure  ForEach  generates  the  body  of  a  parallel  block. 
Procedure  Iter  generates  the  body  of  an  iterative  block.  The 
procedure  Ret,  called  by  GenBlk,  ForEach,  and  Iter,  generates  a  Return 
statement . 


VI I. 4  GENERATING  DATA  DECLARATIONS 


Declarations  are  generated  for  the  program  header,  the  interim 


variables,  and  the  output  parameters .  The  program  header  is  generated 


first . 


VII. 4.1  The  Program  Header 


The  program  header  consists  of  the  program  name,  input 
parameters,  and  output  result  type. 


<  programheader  >  : :  PROGRAM  <  program-  id  > 

[  < parameter list >  ]  <result> 

<  parameter  list  >  : •(  •  <parmlist> 

C  *;*  <parmlist>  ]*  *)' 

<parmlist>  : :-  <parmid>  [  ’»*  <parmid>  ]*  •:* 

(  [STORED]  STREAM  [STREAM]*  ] 

< type id > 

<result>  !!»  •['  <result>  [  •,*  <result>  ]*  ']• 

!  [  [STORED]  STREAM]  <typeid> 


Example: 

PROGRAM  PIBONACCI(N: INTEGER):  STREAM  INTEGER; 

The  program  name  is  FIBONACCI .  There  is  one  input  parameter,  N,  of 
type  integer.  The  output  parameter  is  of  type  stream,  with  the  base 
type  of  the  stream  as  integer. 

The  name  for  the  program  is  taken  from  the  name  of  the 
specification.  Declarations  for  the  input  and  output  parameters  are 
generated  from  the  data  description  section  of  the  data  flow  template. 
Data  description  entries  referring  to  files  of  type  SOURCE  are  used  to 
generate  the  input  parameter  list.  TARGET  file  entries  are  used  to 
generate  the  output  parameter  list. 


» 


£ 


A  file  node  is  the  root  of  a  generalized  tree.  Recall  from 


Chapter  4  that  each  data  node  has  attributes  SONl,  the  node  number  of 
the  leftmost  descendant  of  the  node,  and  BROTHER1,  the  node  number  of 
the  sibling  to  the  immediate  right  of  this  node.  By  following  the 
SONl  field  of  a  file  node  and  the  BRDTHER1  field  of  descendants  of  the 
file  node,  all  the  fields  contained  in  a  file  may  be  accessed.  The 
input  and  output  parameters  are  generated  from  the  descendents  of  the 
source  and  target  file  nodes  respectively.  In  the  restricted  form  of 
tree  used  for  input  and  output  parameters,  each  data  node  has  zero  or 
one  son  and  zero  brothers. 

A  MaO  input  parameter  declaration  is  generated  as  follows: 

The  parameter  name  is  the  name  of  the  leaf  node  of  the  tree  whose  root 
is  an  input  file.  The  base  type  of  the  parameter  is  the  data  type 
associated  with  the  leaf  node.  The  dimensionality  of  the  leaf  node 
determines  the  number  of  "STREAM"  prefixes  to  which  the  base  type  is 
appended.  For  example,  for  the  Model  data  declaration 

MAT  IS  RECORD  (RON  (10)); 

RON  IS  GROUP  (COL  (10)); 

COL  IS  FIELD  INTEGER; 

the  MaD  parameter  name  is  COL;  the  base  type  is  integer;  and,  since 
the  dimensionality  of  COL,  the  leaf  node,  is  two,  there  are  two 
"STREAM"  prefixes.  The  parameter  declaration  is  as  follows: 

COL:  STREAM  STREAM  INTEGER 


If  a  node  dimension  is  virtual,  then  the  STREAM  prefix  for  that  node 
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dimension  is  omitted  in  the  parameter  declaration.  Por  example,  if 
the  Model  declaration  for  an  array  used  to  compute  the  factorial  of  a 
number  is 

FAC  IS  FIELD  (*)  (FIXED  BINARY); 

and  dimension  1  of  FAC  is  found  by  the  scheduler  to  be  virtual,  then 

the  MaD  declaration 

FACs  INTEGER; 

is  generated  instead  of 

FAC:  STREAM  INTEGER; 

The  output  parameters  are  formed  in  the  same  way.  However,  only 
the  data  type  of  the  output  parameter  is  specified,  not  the  parameter 
name.  This  is  similar  to  the  function  result  declaration  in  Pascal. 
If  there  is  more  than  one  TARGET  file,  then  the  output  parameter  list 
is  enclosed  in  brackets,  for  example,  [STREAM  INTEGER,  REAL).  This 
notation  allows  the  program  to  return  a  composite  result,  a  record. 
The  first  field  of  the  record  is  a  STREAM  INTEGER;  the  second  field 
is  a  REAL  number. 

VII. 4. 2  Data  Declarations  For  Global  Variables 

After  the  program  header  is  generated,  GenDcl  generates 
declarations  for  global  variables.  Variables  in  the  interim  "file"  of 
the  file  description  section  of  the  template  are  processed.  If  such  a 
variable  is  not  local  to  a  nested  block,  a  Mao?  declaration  for  the 
variable  is  generated.  Following  these  interim  variables  come 


dec  lar  at  ions  for  the  variables  which  hold  the  program  result.  The 


target  file  descriptions  in  the  template  are  processed  to  obtain  these 
variables.  The  syntax  of  variable  declarations  in  Mad  is  shown  below . 
The  <typedefn>  refers  to  the  data  type  of  the  variable. 

<blockdeclarations>  : <id>  [  *,*  <id>  ]*  : 

<typedefn>  • ; * 

[  <id>  [  ' , *  <id>  ]*  : 

<typedefn>  *;•  ]* 

VII. 4. 2.1  Interim  Variables  - 

There  is  one  entry  in  the  data  description  portion  of  data  flow 
template  for  the  interim  "file.**  Any  variables  declared  in  the  Model 
specification  which  are  not  part  of  a  source  or  target  file  are 
members  of  the  interim  "file."  The  data  structure  of  an  interim  item 
in  Model  is  also  a  generalized  tree.  However,  the  MaD  STRUCT 
construct,  which  is  provided  to  describe  a  variable  whose  data 
structure  is  a  generalized  tree,  is  not  used  in  code  generation.  This 
is  because  there  are  several  restrictions  in  MaD  on  definition  and 
usage  of  fields  in  a  data  structure  declared  with  a  STRUCT  data  type. 
Instead,  the  tree  for  the  interim  variable  is  transformed  to  a  set  of 
restricted  (as  opposed  to  generalized)  trees.  Each  restricted  tree 
has  the  form  of  the  tree  for  sun  input  or  output  parameter,  that  is,  a 
zero  or  greater  level  tree  with  a  single  leaf  node.  A  data 
declaration  is  generated  for  each  restricted  tree  using  the  method 


outlined  above  for  input  and  output  parameters.  An  example  of  data 


declaration  for  interim  variables  is  as  follows: 

Example: 

Let  an  interim  variable  in  a  Model  specification  be  declared  as 
follows : 

SCORES  IS  GROUP  ( 10O )  ( SCORE_l,  SCORE_2); 

SCORE_l  IS  PI ELD  (PIXED  BINARY); 

SCORE_2  IS  PIELD  (PIXED  BINARY); 

Hi*  data  structure  of  SCORES  is  illustrated  in  Pigure  7.3a.  This  data 
structure  is  transformed  into  the  data  structure  illustrated  in  Pigure 
7.3b.  The  MaD  declaration  generated  for  this  example  is  as  follows: 

SCORE_l:  STREAM  INTEGER; 

SCOREL.2:  STREAM  INTEGER; 

Declarations  for  interim  data  which  the  scheduler  found  to  be 
local  to  a  particular  block  are  not  generated  in  the  outer  block. 
Instead,  these  declarations  are  generated  in  the  block  in  which  the 
data  is  defined.  Given  the  template  for  specification  EXAMPLE,  the 
following  declarations  are  generated  in  MaD: 

PROGRAM  EXAMP LE( 

A:  STREAM  INTEGER; 

B:  STREAM  INTEGER 
): STREAM  INTEGER; 

DECLARE  C:  STREAM  INTEGER; 


A  declaration  is  not  generated  for  X,  because  X  is  a  local  data  node. 


VII. 4. 2. 2  Variables  To  Hold  The  Program  Result  - 

<Rie  output  parameters,  which  correspond  to  the  target  files  in 
the  specification,  are  declared  next.  Only  the  data  types  of  these 
variables  are  declared  in  the  < result >  portion  of  the  program  header. 
For  example,  if  a  record  for  the  target  file  in  a  Model  specification 
is  declared  as  follows: 

RESULT  IS  RECORD  (RES(SO)); 

RES  IS  FIELD  (INTEGER); 

then  the  MaD  output  parameter  declaration  is  STREAM  INTEGER,  and  the 
declaration  in  the  <blockdeclarations>  is 
RES :  STREAM  INTEGER 

The  output  parameters  are  declared  using  the  same  method  as  is  used  to 
generate  input  parameters.  The  name  associated  with  the  leaf  node  is 
used  as  the  variable  name.  The  data  type  is  generated  from  the  leaf 
node  data  type  as  for  the  input  parameters. 

VII. 5  GENERATING  THE  ASSIGNMENT  STATEMENTS 

The  next  part  of  the  MaD  program  consists  of  a  definition  for 
each  of  the  variables  declared  in  the  global  declarations.  This 
definition  section  is  generated  as  one  or  more  assignment  statements, 
a  MaD  <let>  clause.  The  format  of  a  <let>  clause  is 

<let>  : LET  <lhs>  <expression> 

C  ;  <lhs>  <expression>  ]* 


<  Lhs> 


< variable  id>  | 


.5' 


•(•  <var table  id>  [  * <variable  id>  ]*  •]• 

The  second  for*  of  < lhs>  is  a  cooposite  definition.  If  this  for*  is 
used,  the  expression  on  the  right  hand  side  must  evaluate  to  a  list  of 
results.  Each  result  in  the  list  must  match  the  data  type  of  each 
component  of  the  list  on  the  left  hand  side.  For  example,  if  II  i3  of 
type  integer  and  R1  is  of  type  real,  then  the  following  composite 
definition  is  correct: 

[II,  Rl]  [15,  0.9]; 

The  <let>  clause  is  generated  from  the  block  description  portion 
of  the  data  flow  template.  Each  block  description  entry  consists  of 
two  lists.  The  first  is  a  list  each  element  of  Which  has  two 
components,  the  data  node  defined  in  the  block  and  Which  dimension  of 
the  data  node  has  been  defined.  The  second  is  a  list  of  members  of 
the  block.  A  member  can  be  either  an  assertion  or  another,  nested 
block.  If  a  block  member  is  an  assertion,  then  a  Mao  definition  is 
generated  from  the  assertion.  Procedure  GenAssr  transforms  the 
assertion  into  a  simple  assignment  statement. 

VI 1 . 5 . 1  Procedure  GenAssr 

GenAssr  modifies  the  text  of  the  assertion  to  conform  to  MaD 
syntax.  For  example,  instead  of  the  Model  separating  the  left  and 
right  hand  sides  of  an  assertion,  a  * :**  is  used  in  the  MaD  form.  The 


name  of  a  variable  used  in  the  Model  specification  may  be  modified  in 


the  MaD  program.  The  MaO  version  of  the  variable's  name  might  have 
fewer  subscript  qualifiers  than  the  Model  version.  This  would  occur 
if  one  or  more  of  the  variable's  dutc.isions  were  found  by  the 
scheduler  to  be  virtual.  Another  difference  is  that  a  qualified 
variable  is  not  permitted  on  the  left  hand  side  of  a  MaD  definition. 
Therefore,  subscripts  sure  omitted  from  the  variable  on  the  left  hand 
side  of  the  assertion  when  the  assertion  is  output  as  a  MaD 
definition.  MaD  permits  a  qualifier  'NEW'  for  the  variable  being 
defined.  The  'NEW'  qualifier  is  used  for  variables  defined  in  an 
iterative  block.  At  each  iteration,  a  new  instance  of  the  variable  is 
defined.  For  example,  if  a  variable  FAC  is  defined  by  an  assertion 
FAC< I)  ■  IF  I  “  1  THEN  1  ELSE  I  *  PAC< 1-1 ) 

and  dimension  1  of  FAC  is  virtual,  then  the  MaD  equivalent  is 
NEW  FAC  !-  IP  I  *  1  THEN  1  ELSE  I  *  FAC 

VII. 5. 2  Procedure  GenBlk 

GenBlk  generates  a  MaD  block  from  a  block  description  entry  in 
the  template.  The  MaD  program  outer  block  is  constructed  from  the 
first  block  description  entry.  A  MaD  block  is  constructed  in  two 
steps.  Simple  assignment  statements  are  generated  first  for  data 
nodes  defined  directly  in  the  block.  *Rien,  definition  statements  are 
generated  for  data  nodes  defined  in  nested  blocks.  In  the  latter  form 
of  assignment  statement,  the  right  hand  side  is  a  nested  MaD  block. 
Procedure  Genblk,  given  a  block  description  entry,  constructs  the  MaD 


block.  Input  to  Genblk  is  the  block  description  entry,  BO. 


Procedure  Genblk(BO); 

1.  Generate  definition  statements  for  data  nodes  defined  directly  in 
BO. 

For  each  data  node  D  defined  in  80 
{  find  the  block  member.  A,  which  defines  D.  A  is  of  type 
"assertion" .  Call  GenAssr  to  generate  a  simple  assignment 
statement  for  the  assertion.  } 

2.  Generate  definition  statements  for  data  nodes  defined  in  nested 
blocks. 

For  each  member  B  of  80  which  is  of  type  "block", 

{  Let  the  list  of  all  data  nodes  defined  in  B  be  called  Lhs. 

Output  the  list  Lhs  as  the  left  hand  side  of  the  HaO  definition. 
If  an  element  of  ttis  is  a  local  data  node,  then  do  not  output 
that  element  on  the  left  hand  side  of  the  definition.  A  local 
data  node  is  declared  and  used  only  within  the  scope  of  B,  on 
the  right  hand  side  of  the  definition. 

Generate  the  right  hand  side  of  the  definition.  The  right  hand 
side  is  a  block  with  data  locally  defined.  output  a 
declaration  for  the  <  range  set  name>  of  the  range  set 
associated  with  B.  The  form  of  the  declaration  is  ' DECLARE 

< range  set  name>  :  INTEGER; * . 

Next,  output  data  declarations  for  a  local  copy  of  each  variable 
in  Uis.  A  declaration  is  generated  for  local  data  nodes  as 

well  as  globally  declared  variables .  Procedure  Local Var 
produces  the  nested  block  declarations. 

If  the  block  type  is  parallel,  call  ForEach  to  generate  a 
parallel  block. 

Otherwise  the  block  is  iterative.  Call  iter  to  generate  an 
iterative  block. 

} 

Construct  a  < return >  statement.  The  statement  generated  is  of  the 
fora  'RETURN  < variable  list> ' .  The  < variable  list>  corresponds 
to  ttts.  However,  the  name  generated  for  the  local  copy  of  the 
Lhs  name  is  used.  Procedure  Ret  constructs  this  list. 


VII. 5. 3  Procedure  LocalVar 


LocalVar  is  called  by  GenBlk  with  parameters  Lh3,  the  list  of 
data  nodes  defined  in  a  block,  and  Level,  the  nesting  level.  This 
procedure  generates  the  data  declarations  for  the  nested  block  on  the 
riqht  hand  side  of  the  definition. 

Let  D  be  an  element  of  Lhs.  Examine  the  table  Local  constructed 
by  PindVirtual  during  scheduling.  If  D  is  in  Local,  then  omit  the 
declaration  for  D  unless  Level  is  equal  to  the  dimensionality  of  o. 
This  is  done  so  that  a  local  data  node  is  only  declared  in  the  the 
most  deeply  nested  block  in  which  it  is  produced  and  used. 

Now  consider  the  dimensionality  of  D.  D  is  a  leaf  data  node  of 
dimensionality  n.  Level  represents  the  most  significant  dimension  of 
D  which  will  be  used  to  generate  the  declaration.  For  example,  if 
Level  is  2  and  D  has  three  dimension,  D(I,J,K),  then  the  second  and 
third  dimensions  of  0.  those  represented  by  J  and  K,  are  used  to 
generate  the  declaration.  The  dimension  corresponding  to  Level,  that 
of  j,  is  the  most  significant  dimension.  The  dimensionality  of  the 
local  copy  of  D  is  at  most  (n-Level)+l.  Let  the  dimensions  of  D  be 
numbered  from  most  significant  to  least  significant .  Examine  the  node 
subscripts  of  D  from  dimension  #  Level  to  dimension  *  n.  If  a 
dimension  of  0  among  those  examined  is  virtual,  the  dimensionality  of 
the  local  copy  is  reduced  by  one.  Por  example,  let  Level  =  l  and  a 
data  node  FAC  have  dimensionality  2.  The  maximum  dimensionality  of 


the  local  copy  of  FAC  is  (n-Level)+l  -  (2-l)+i  -  2.  However,  if  the 


most  significant  dimension  of  PA C  is  virtual,  the  dimensionality  of 


the  local  copy  of  FAC  is  l.  The  dimensionality  of  the  variable 
determined  by  this  calculation  is  reduced  by  one  if  the  variable  is 
not  in  the  table  Local. 

The  algorithm  for  LocalVar  is  as  follows: 


For  each  D  in  Lhs  which  is  either  non-local  or  is 
local  to  the  current  block. 

Output  the  name  of  D  followed  by  a 

Initialize  the  dimension  count  to  0. 

Examine  each  dimension  of  D, 

starting  at  the  most  significant 
dimension  Level.  If  the  dimension 
is  not  virtual,  add  1  to  the 
dimension  count. 

If  0  is  not  in  the  table  Local,  decrement 

the  dimension  count  of  O  by  1. 

Output  the  data  type  of  D,  for  example, 

*  INTEGER'  or  'REAL*. 


VII. 5. 4  Procedure  ForEach 

This  procedure  generates  the  "FOR  EACH"  statement  for  a  parallel 

block.  The  call  parameter  to  ForEach  is  the  parallel  block  Block. 

The  form  of  the  Mab  <foreach>  statement  used  is 

<foreach>  : 'FOR  EACH'  <variable  name >  'IN'  <stream>  'DO' 

<lhs>  ' <expression> 

[  ';*  < lhs>  * < expression)  ]+ 

'RETURN'  <expression> 

ForEach  first  generates  the  block  header,  the  first  line  of  the 


<foreach>  definition  above.  The  <variable  name>  is  the  <range  set 
name>  of  the  r ange  set  for  Block.  The  <stream>  consists  of  a  sequence 
of  indices,  from  1  to  the  range  set  maximum.  It  is  constructed  by  a 
standard  procedure  PROLIFERATE,  Which,  given  a  maximum  integer, 
returns  a  stream  of  integers  from  1  to  the  maximum.  The  range  set 
««*•««»—  can  be  defined  in  Model  in  one  of  three  ways.  The  maximum  can 
be  a  constant;  it  can  be  defined  by  the  end-of-file  condition;  or  it 
can  be  defined  by  a  range  array.  If  the  maximum  is  constant,  the 
constant  is  the  input  to  PROLIFERATE.  If  the  maximum  is  defined  by 
end-of-file,  then  the  MaD  function  SIZ£(  < stream  id> )  is  used.  SIZE, 
given  the  name  of  a  stream,  returns  the  number  of  elements  in  the 
stream.  If  the  may-imum  is  defined  by  the  Model  range  array  SIZE,  then 
the  range  array  is  used  as  input  to  PROLIFERATE.  If  the  maximum  is 
defined  by  the  Model  range  array  END,  then  the  input  to  PROLIFERATE  is 
the  number  of  elements  in  the  dimension  of  the  END  array  corresponding 
to  the  Block  range  set. 

For  example,  let  I  be  the  name  associated  with  a  range  set.  Let 
the  maximum  for  I  be  the  number  of  elements  in  a  file  A  consisting  of 
a  sequence  of  integers.  Then  the  first  part  of  the  <foreach> 
statement  in  MaD  is  as  follows: 

FOR  EACH  I  IN  PROLIFERATE( SIZE( A) )  DO 

In  this  example,  SIZE  is  the  MaD  function  which  returns  the  number  of 
elements  in  A.  PROLIFERATE  creates  a  stream  of  indices  from  one  to 
the  number  of  elements  in  A. 


The  body  of  the  <£oreach>  is  generated  by  Genblk.  The  procedure 
ForEach  calls  Gehblk(B)  to  generate  the  definition  statements  which 
constitute  the  body  of  B  and  the  return  statement. 

VII. 5. 5  Procedure  Iter 

This  procedure  is  called  with  input  parameter  Bloch,  a  block 
description  entry  for  am  iterative  block.  Iter  generates  the  'WHILE' 
statement  for  the  iterative  block.  The  form  of  the  'WHILE'  statement 
used  is  am  follows: 

<while>  'IMIT'  < range  set  nane>  ' :=  1;' 

'WHILE'  < range  set  name>  ' <-•  < range  max>  'DO' 

[  'NEW'  ]  <lhs>  :»  < express ion > 

[  <lhs>  < express ion >  ]+ 

' RETURN '  < expression > 

The  <ramge  max>  is  found  in  the  same  way  as  described  in  ForEach 
above.  The  body  of  the  iteration  is  generated  by  Gehblk(B). 

VII. 6  GENERATING  THE  RETURN  STATEMENT 

Procedure  Ret  constructs  the  Return  statement  for  a  block.  This 
procedure  is  called  with  Lhs,  the  list  of  data  nodes  generated  in  a 
block,  and  Level,  the  nesting  level.  For  each  element  D  of  Lhs,  Ret 
does  the  following: 

1.  If  0  is  a  local  data  node,  then  skip  it. 


2.  If  the  dimension  of  D  corresponding  to  Level  is  not  virtual,  then 


output  'ALL' . 

3.  Output  the  name  of  the  local  copy  of  D. 

VI I. 7  CODE  GENERATION  FOR  EXAMPLE 

Generating  the  MaD  assignment  statements  and  the  return  statement 
from  the  data  flow  template  is  illustrated  with  the  EXAMPLE  template. 

The  block  description  entries  for  EXAMPLE  axe  shown  in  Figure  7.4. 

|  Block  Description: 

Block  1:  Type=Simple  Level-0  Range-0 
Data  Nodes:  None 

Block  Members:  {(ni_name-Block  2  m_type=block ) } 

Block  2:  Type-Parallel  Level-1  Range-1 

Data  Nodes:  ((d_name-X  d_dim-l)  (d_name-C  d-dim-L)) 

Block  Members:  {(m_name-Assertion  1  m_type-assertion) 

(m_name— Assertion  2  m_type-assertion)} 

Figure  7.4  Block  Description  Entries  for  EXAMPLE  I 

Procedure  Genblk  is  called  with  parameter  Block  1.  There  are  no 
data  nodes  defined  in  Block  1,  so  Step  1  of  Genblk  is  skipped.  There  | 

is  one  member  of  Block  1  of  type  "block",  so  Step  2  is  invoked. 

PindUis  is  called  with  parameters  Lhs-null  and  Block-  Block  2. 

Pindlhs  returns  a  list  Lhs  (X,C)  as  the  data  nodes  defined  in  Block  2.  | 

The  left  hand  side  of  the  MaD  definition  for  Block  2  is  generated  from 


Lhs.  The  lhs  is 


X  is  not  part  of  the  lhs  since  it  is  a  local  data  node. 

Next,  the  right  hand  side  of  the  definition  is  generated.  "I"  is  the 
name  of  the  subscript  associated  with  the  range  set  for  Block  2. 
DECLARE  It  INTEGER; 

Local  declarations  are  generated  for  each  entry  in  Lhs : 

X:  INTEGER;  C:  INTEGER; 

Since  Block  2  is  of  type  parallel,  procedure  ForEach  is  called  with 
parameter  Block  2.  This  procedure  generates 
FOR  EACH  I  IN  PROLIP£RATE(  100 )  DO 

ForEach  then  calls  Genblk  recursively  with  Block-Block  2.  Genblk  now 
generates  the  definitions  for  the  two  assertions  in  Block  1  (Step  1  of 
Genblk.).  The  output  of  Genblk: 

X  ACI)  +  B( I]; 

C  X  *  X; 

RETURN  ALL  C; 

ForEach  returns  to  Genblk.  Genblk  generates  the  RETURN  statement  for 
Block  1: 

RETURN  C; 

The  complete  HaD  program  is  shown  below: 

PROGRAM  EXAMP LE( 

At  STREAM  INTEGER; 

B:  STREAM  INTEGER 
);  STREAM  INTEGER; 

DECLARE 

C:  STREAM  INTEGER; 
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C  s-  DECLARE 

I:  INTEGER; 

X:  INTEGER; 

C:  INTEGER; 

FOR  EACH  I  IN  PROLIFERATE; 100 )  DO 
X  ACI]  +  B[I]; 

C  X  *  X; 

RETURN  ALL  C; 

RETURN  C 

END 


VII.  8  CONCLUSION 


This  concludes  the  discussion  o€  code  generation  to  MaD.  We  have 
shown  how  the  data  flow  template  is  translated  to  a  MaD  language 
program.  The  data  description  section  of  the  template  is  used  to 
generate  the  program  header  and  global  variable  declarations.  The 
block  description  section  is  processed  recursively  to  generate  the 


assignment"  statements,  either  simple  assignments  or  nested  blocks. 


VI 1 1.1.2  Scheduling  The  Array  Graph  Por  Data  Plow 

The  approach  in  scheduling  has  been  to  partition  the  array  graph 
into  Maximally  Strongly  Connected  Components  (MSCC).  Iterative  blocks 
are  generated  whenever  possible  for  MSCC’s  derived  from  the  original 
array  graph.  Remaining  nodes  with  common  ranges  in  corresponding 
dimensions  are  merged  into  graph  components.  Prom  these  components, 
parallel  blocks  are  generated.  Each  parallel  block  can  be  expanded 
into  multiple  incarnations  which  cam  execute  concurrently. 

After  the  array  graph  has  been  partitioned,  the  Scheduler 
searches  for  dimensions  of  data  nodes  which  can  be  virtual.  Data 
nodes  local  to  a  block  and  data  nodes  produced  by  iterative  blocks  can 
potentially  be  reduced  in  dimension.  Dimension  reduction  results  in  a 
more  efficient  data  flow  program. 

VIII. 1.3  Generating  Data  Plow  Programs 

Ne  have  shown  how  the  template  generated  by  the  Scheduler  may  be 
used  to  generate  programs  to  one  specific  data  flow  language,  Mao. 
The  generated  program  can  be  compiled  and  run  on  the  Manchester 


VIII. 2  FUTURE  RESEARCH 


Several  promising  avenues  have  presented  themselves  in  the  course 
of  this  investigation.  W6  summarize  these  areas: 

1.  The  current  Model  system  is  embedded  in  the  PL/ I  programming 
environment.  Creating  a  data  flow  version  of  the  Model  language  with 
constructs  more  suited  to  a  data  flow  environment  would  provide  a  good 
tool  for  programming  data  flew  machines  in  a  nonprocedural  language. 

2.  In  this  work,  we  produce  a  data  flow  program  partitioned  into 
blocks.  An  interesting  area  of  study  would  be  the  static  allocation 
of  these  blocks  to  processors.  Processor  allocation  would  depend  on 
data  generation  and  usage.  The  array  graph  contains  information  which 
would  facilitate  this  analysis. 

3.  Another  interesting  study  would  be  a  performance  comparison  of 
sequential  vs.  data  flow  programs  produced  by  the  same 
specifications . 

4.  Developing  applications  in  Model  suited  to  data  flow  is  another 
area  of  research.  Such  algorithms  as  graph- theoretic  problems  and 
discrete  event  simulation  could  be  written  in  Model  and  run  on  a  data 


flow  machine. 


5.  Model  produces  iterative  and  parallel  blocks  from  tbe 
specification.  An  enhancement  to  Model  currently  being  implemented  is 
to  produce  whole  specifications  as  blocks  and  to  analyze  the  data 
dependencies  among  specifications.  [ShiY82]  describes  this 
distributed  processing  version  of  Model. 


APPENDIX  A 


MAD  8NF 


< programme > 


PROGRAM  < prog ram- id >  [  < parameter list >  ] 
<header> 

[  <  typedec lar at ion  >  ] 

C  <funcdefn8>  ] 

C  <expression>  ] 

END 

C  < assembly-code >  ] 


< parameter list >  * ;«  •(•  <parmlist>  (  <parmlist>  ]*  •)• 
<parmlist>  : <parmid>  (  * »*  <parmid>  ]*  's' 

[  [STORED]  STREAM  [STREAM]*  ]  < type id > 


<header> 


: :*  •(*  <header>  (  ' <header>  ]*  ’]'  | 
[  [STORED]  STREAM]  < type id > 


< typedec lar at ion > * s—  TYPE  <typeid>  *■'  <typedefn> 

<typedefn>  s [STORED]  <structyp>  | 

[STORED]  <stxuctypeid>  j 
< type id > 

<structyp>  n»  STREAM  [  <structypsys>  ]*  <structyp>  j 

STREAM  <structypeid>  | 

STRUCT  <gen>  [  <gen>  ]*  ENDSTRUCT 


<structypsys>  : STREAM  !  STRUCT  j  SET 


<gen>  :  :•*  <  generator  id  > 

C  '( *  <typ>  C  •.*  <typ>  ]*  ' )’  1 

<typ>  s :»  <structypsys>  <structyp>  |  <typeid> 

<£uncde£ns>  : :»  FUNCTION  <funcid>  < parameter list > 

< header >  * ; ' 

[  <£uncdefns>  ]  C  <expression>  ]  *;* 

< expression >  : :»  DECLARE  < block)  { 

IP  <  condexp  >  j 
CASE  <casexp>  j 
<basicexp> 

<block>  <id>  [  <id>  1*  :  <typede£n> 

<legalblock> 

<legalbloc)c>  <id>  [  '  <id>  ]*  :  <typede£n>  *;*  | 

<let>  | 

<  init£orwhile> 

<let>  s s-  LET  [  <lh3>  :«  < expression)  ]+ 

RETURN  < expression) 

<Ih3>  :t-  <id>  S  ’[*  <lhs>  t  *»’  <»»>  1*  'V 

<init£orwhile>  : [INIT  [  <lbs>  ' < expression)  ]+  ] 

FOR  EACH  <id>  IN  otreamid) 

C  *;*  EACH  <id>  IN  <streamid>  ]* 

DO 

[WHILE  < expression >  DO] 

C  CNEW1 

<lhs>  ■ :-*  expression)  •;*  ]+ 

RETORN  < expression) 

< condexp>  : < expression)  THEN  <expression>  ELSE  <expression> 

< casexp >  : s-  <id>  OF 

[  <genid>  [  *(  •  <parameters>  *)*  ] 

•s'  <expression>  ]+ 

ENDCASE 

< parameters)  s  s-  <id>  [  *,*  <id>*  ] 

<basicexp>  : < all-remainder)  | 

<simpleexp>  (  <relop>  <simpleexp>  ] 

< all-remainder)  is-  ALL  <basicexp>  [BUT  <basicexp>  ]  j 


REMAINDER  [  <id>  ] 


<simpleexp>  : ]  <term>  <simpleops>  <term> 

<term>  s <factor>  <termops>  <£actor> 

<slmpleops>  s s-  *+*  |  *-•  |  OR  |  MAX  |  MIN 

<termops>  !*-’*•  |  DIV  |  MOD  |  AND 

<£actor>  :  :•»  <generatorid>  [  *(  *  <parmid> 

[  • ,*  <parmid>  ]*  • )•  ]  i 
< simple id >  <quali£ier>  j 
<function>  | 

*(*  <basicexp>  •)•  | 

<constvalue>  j 

• C  C  <baaicexp>  f  * , •  <basicexp>  1 * 

']*  I 

NOT  < factor >  { 

LAMBDA  | 

< reduct ionops >  *i*  < factor > 

< reducrt ionopa>  : |  •  +  •  |  AND  |  OR  |  MAX  |  MIN 

<quali£ier>  t ' [ '  <b-or-col>  [  * #'  <b-or-col>  ]+ 

[  * : •  [  <basicexp>  ]  ]  ']' 

<b-or-coI>  « <basicexp>  |  • : • 

<£unction>  t i*  <userde£>  |  < standard > 

<userdef>  : s-  <funcid>  •( •  <basicexp>  [  * , *  <basicexp>  ]*  •)• 

< standard)  s !—  < cons >  •(•  <basicexp>  ,  <basicexp> 

*)*  ! 

<streamop>  '('  <basicexp>  ')'  j 
CONCAT  ' ( *  <basicexp>  • , '  <basicexp> 

*)*  I 

< onearg £ns— i >  •( •  <basicexp>  •)•  j 
<oneargfns-r>  ’( ■  <basicexp>  ' )*  1 
< exponentiation)  •( •  <basicexp> 

<basicexp>  • )•  j 

<COns>  : CONSL  |  CONS 

<streaaop>  : t-  FIRST  j  REST  |  GET  |  EMPTY  I  SIZE 


<oneargfns-i> 


:  ISQRT  |  ABS  |  EVEN  j  OOD 
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<onearg£ns-r>  : SIN  |  COS  |  ARCS IN  |  ARCCOSi  AU(  j  LN 

SQRT  |  ROUND  |  TRUNC 


< exponent iation> : :=  EXR  j  EXI 


APPENDIX  B 


MODEL  BNP 


<MODEL_SPECIFICATION>  :  :-[  <MODEL_BODY_STMTS>  1* 

<  MODEU.SPEC I PICATION > 

<MODEL_BODY_s™TS>  :  :  -MODULE  <  MOOULE_NAME_STMT  > 

(SOURCE  <SOURCE_FILES_STMT> 

(TARGET  <TARGET_FILES_ST*fT> 

|  «#_END*« 

|  <DCL_DESCRIPTION> 

J  «OLD_PILE_STMT> 

|  <SIMPLE_ASSERTION> 

<DCL_DESCRIPTION>  :  s-  I  < DATA^S PEC) 

C,  < INTEGER)  <DATA_SPEC>  )*  <ENDCHAR> 

<DATA_SPEC>  <DCL_MVAR>  [(  <OCCSPEC>  )]  [  <IS>  ] 

<ATTR_SPEC> 

<ATTR_SPEC>  : <FILE>  <PILE_DESC>  <STORAGE_DESC> 

]  < RECORD) 

|  <PIELD_STMT> 

1  C<GROUP>] 

< S IMP LE_ ASSERTION) :  <MVAR> 

< BOOLEAN_EXPRESS ION >  <ENDCHAR> 

<SUB_  VARIABLE) : t-  <VAR> 

C( 

< BOO LEAN_ EXPRESS ION)  [, 

<  BOOLEAN. EXPRESS ION  >  ]* 

)  1 

<  SUB  VARIABLE I > s 


<VAR> 


< BOOLEAN- EXPRESS ION)  [, 

<  BOO LEAN_ EXPRESS ION  > ] * 

)  1 

< BOOLEAN-EXPRESS ION) : s-  <COND_EXP>  | 

<  BOOLEAN_TERM> 

[<OR>  <  800LEAN_TERM>  ]* 

<COND-£XP) : s-IP  < BOOLEAN-EXPRESSION) 

THEN  <BOOLEAN_EXPRESSION> 

[  ELSE  <  BOOLEAN_EXPRESS ION  >  ) 

<OR>s:»  | 

< BOOLEAN-TERM) : < BOOLEAN_FACTOR> 

C_  < BOOLEAN-FACTOR)  ]* 

< BOOLEAN-FACTOR) : <(X)NCATENATION> 

C  < RELATION)  < CONCATENATION >  ] 
< RELATION > ! :=  =  |  -  j  <  |  <-  j  >  |  >- 
< (CONCATENATION)  :  <ARITH_EXP> 

[  <CONCAT>  <ARITH_EXP>  ]* 

<CONCAT>  J  :*»  It 
<ARITH_EXP>: [<SIGN)  ] 

<TERM)  [<OPS>  <TERM>  ]* 

<TERM)::-  <FACTOR» 

[ <MOPS>  < FACTOR)  ]« 

< FACTOR) : [  ]  < PRIMARY) 

C<EXPON)  < PRIMARY)  )* 

<EXPON)::-  ** 

< PRIMARY)  s :  -  <  IS_PRIM> 

<IS_PRIM>  j  :-  (  < BOOLEAN_EXPRESS ION >  ) 

|  < NUMBER) 

|  <STRING_FORM> 

|  <FUNCTION_CALL) 

|  <  SUB_ VARIABLE 1 > 

<STRINGJPORM>  :  '  [  < STRING)  ] 

*  C  B  ] 

<PUNCTION_CALL> ! < FUNCTION_NAME > 

[(  < BOOLEAN-EXPRESSION) 

(,  < BOOLEAN-EXPRESS ION) 

]*  )  ] 

<MVAR> : (  < SUB-VARIABLE) 

t,  < SUB-VARIABLE)  ]*  ) 

|  < SUB-VARIABLE) 

<VAR>:i-  <NAME) 

[.  <NAME>  ]*  /STR-CON/ 

<DCL_MVAR>  (  <VAR>  [,  <VAR>  ]*  ) 

|  <VAR> 

<OPS) : +  |  - 
<MDPS)  s  i-  *  J  / 
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<*XXJLE_NAME_STWr>  :=■  <NAME> 

<ENDCHAR> 

<SOURCE_FILES_STKT> : :=  [ <FILE_KEYWORD> ] 

<SOURCE_ FILE LIST)  <ENDCHAR> 

<FILE_KEWORD>  :  :  -FILES  |  FILE 
<SOURCELFILELIST> ; <NAME> 
t,  <NAME>  ]* 

<TARGET_PILES_STMT> : :=  [ <FILE_KEYWORD> ] 

<TARGET_PILELIST>  <ENDCHAR> 

<TARGET_PILELIST) i <NAME> 

C,  <NAHE>  ]* 

<DATA_DESC_ST*fT>  s  <DATA_DESCRIPTION>  <ENDCHAR> 
«DATA_DESCRIPTION>  : 

<FILE_ST»fr> 

J  <  RECORD_STHT> 

J  <GRDOP_ST»er> 

1  <field_s™t> 

J  <SOB_SWW> 

<SUB_SWT>  :  :—< SUBSCRIPT)  [(  <OCCSPEC>  )] 

< SUBSCRIPT > : SUB  |  SUBSCRIPT  |  SUBSCRIPTS 
<FILE> : FILE  !  REPORT  J  FILES  !  REPORTS 
<RECORD_STHT> s < RECORD)  [( ]  <ITEK_LIST>  [ )] 

< RECORD)  : j-  REC  |  RECORD  |  RECORDS 
<ITEM_LIST>ss-  < ITEM)  [(,]  <ITEM>]* 

< ITEM) NAME)  [  .  <NAME)  ]*  f(  <OCCSPEC>  >] 

<occsr,=r:>s <star>  |  <mimocc>  [  <maxdcc>  ] 

<STAR)  :  * 

<MINOCC>  s :—< INTEGER) 

<MAXDCC>  :  [  :  ]  <  INTEGER) 

|  < INTEGER) 

<GROUP_STMT>  :  s-  < GROUP)  [(  ]  <ITEM_LIST>  [  )] 

< GROUP)  : GRP  j  GROUP  I  GROUPS 
<FIELD_ST*fT> : < FIELD)  <FIELD_ATTR> 

<PIELD>  : PLD  |  FIELD  j  FIELDS 
<PIELD_ATTR> : [( ]  <TYPE>  [  <LENG_SPEC>]  [ )1 
<LENG_SPEC>  : s«  (  <MIN_LENGTH>  [  <MAX_ LENGTH)  ]  ) 

| <MIN_LENGTH>  [ <MAX_ LENGTH) ] 
<MIN_LENGTH> : < INTEGER) 

<TYPE> : <STRING_SPEC>  |  <NUM_SPEC> 

<STRING_SPEC>  s <STRING_TYPE) 

<9TRING_TYPE> < CHAR  |  CHARACTER  |  BIT  !  MUM  |  NUMERIC 
<NUM_SPEC) ! <NUH_TYPE)  C  <PIXFLT)  ] 

<NUH_TYPE> : BIN  |  BINARY  |  DEC  |  DECIMAL 
<FIXPLT> : FIX  }  FIXED  !  PL  |  FLOAT  j  FLT 
<MAX_ LENGTH) : [s]  < INTEGER) 

|  ,  <SINTGR> 

|  < INTEGER) 

—  -  < INTEGER)  |  < INTEGER) 


<SINTGR) « 1 


<SIGN>: +  |  - 

<R£CG> : :=  < RECORD)  )  < GROUP) 

<ENDCHAR>::«  ; 

<END_CHAR>  s  :=■  ; 

<IS>::-  IS  |  -  |  ARE 

<PrLE_STMT> s  8-  <FILE>  <SON_DESC> 

<FILE_DESC>  <  STORAGE_DESC  > 

<SON_DESC> ; :-(  <  rTEM_DIST>  ) 

|  <RECG>  [MAHE]  [<IS>]  [< 1  < ITEM)  [  )] 
<OLD_FILE_STKT> : <FILE>  [NAME]  [<IS>] 

<DCLi_MVAR> 

<RECG>  [NAME]  [<IS>]  [( ]  <ITEM>  [ )] 
<ENDCHAR> 


APPENDIX  C 


EXAMPLES 


This  Appendix  contains  examples  of  the  translation  of  Model 
specifications  to  MaD.  For  each  example  the  following  reports  are 
reproduced  below: 

-  Listing  of  Specification 

-  Block  Description 

-  MaD  Program 
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C.l  MATRIX  MULTIPLY 

This  example  is  the  familiar  matrix  multiply  program  from 
Chapter  1.  A  and  B  are  the  two  10x10  input  matrices  to  be 
multiplied.  C  is  the  10x10  output  matrix  result.  In  a  new 
dialect  of  Model  under  development,  even  the  two  assertions 
defining  the  multiplication  will  not  be  needed.  The  dialert 
supports  matrix  operations  at  the  source  level,  so  that  the 
operator  I*  indicates  matrix  multiplication  [LiuW82]. 

It  should  be  noted  that  the  generated  MaD  program  does  not 
use  a  "transpose"  function  as  does  the  Id  function  for  matrix 
multiply  in  Chapter  1.  The  matrix  B  input  to  the  Id  version  of 
the  program  is  transposed  and  the  transposed  array  is  input  to 
function  mmt.  Doing  so  creates  a  stream,  the  column  of  B.  The 
inner  product  of  a  row  of  A  with  a  column  of  B  cam  then  be 
computed  within  a  simple  loop.  In  the  Mad  implementation  a  loop 
is  not  needed  in  order  to  compute  the  inner  product.  The  partial 
products  are  computed  in  INTERIM_X,  and  the  reduction  operator  +1 
is  used  to  add  all  the  partial  products.  In  addition,  MaD  stores 
each  dimension  after  the  first  of  a  multi-dimension  stricture  03 
a  random  access  permanent  structure  in  the  Matching  Store.  Each 
element  cam  be  accessed  repeatedly  with  no  cost  other  than  the 
cost  of  selecting  a  single  element.  The  entire  array  need  not  be 
duplicated.  Therefore,  the  generated  MaD  program  does  not  use  a 


transpose  function. 
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MODULE:  MM; 


SOURCE:  IMF I LEI,  INFILE2; 
TARGET:  OUTFILE; 

ZNFILE1  IS  FILE  (INREC1); 
IMRECI  IS  RECORD  (INl(lO)); 
INI  IS  GROUP  (  A(  10  )  ) ; 

A  IS  FIELD  (NUMERIC); 

INFILE2  IS  FILE  ( INREC2 ); 
INREC2  IS  RECORD  (IN2(10)); 
IN2  IS  GROUP  ( B( 10 )  ) ; 

B  IS  FIELD  (NUMERIC); 

OUTFILE  IS  FILE  (OUTREC); 
OUTREC  IS  RECORD  (OUTl(lO)) 
OUT1  IS  GROUP  (C( 10)); 

C  IS  PIELD  (NUMERIC); 

XD  IS  GROUP  (  Xl( 10 ) ) ; 

XI  IS  GROUP  (X2(  10)); 

X2  IS  GROUP  (X(  10)); 

X  IS  FIELD  (NUMERIC); 

I  IS  SUBSCRIPT  (10); 

J  IS  SUBSCRIPT  (lO); 

K  IS  SUBSCRIPT  (10); 

X(  I,  J,K)  -  A(I,K)  *  B(K,J); 
C(I,J)  -  SUM(  X(  I,  J, K), K); 


Block  Description 


Block  1 

SIMP  Level:  0  Range:  0  #  Data  node9:  2 

Heaters:  1 
Data  Nodes: 

INTERIM. X  for  dimension  0  in  block  1 
OOTPILE.C  for  dimension  0  in  block  1 
Blodk  Masters: 

2  BLOCK 
BloCk  2 

PARR  Level:  1  Range:  1  •  Data  nodes:  2 

Hesters:  1 

Data  Nodes: 

INTERIM. X  for  dimension  3  in  block  1 

OOTPILE.C  for  dimension  2  in  block  1 

Block  Hesters: 

3  BLOCK 
BloCk  3 

PARA  Level:  2  Range:  2  #  Data  nodes:  2 

Heaters:  2 
Data  Nodes: 

ItfTERIM.X  for  dimension  2  in  block  1 

OOTPILE.C  for  dimension  1  in  block  2 

BloCk  Hesters: 

4  BLOCK 

5  BLOCK 
BloCk  4 

PARA  Level:  3  Range:  3  *  Data  nodes:  1 

Hesters:  l 
Data  Nodes: 

INTERIM. X  for  dimension  l  in  block  1 

Block  Hesters: 

AASS300  ASSERTION 
BloCk  5 

PARA  Level:  3  Range:  3  •  Data  nodes:  l 

Hesters:  1 
Data  Nodes: 

OOTPILE.C  for  dimension  -2  in  block  1 
BloCk  Hesters: 

AASS310  ASSERTION 
Local  Data  Nodes 


»  Block 


•  Block 


*  Block 


#  Block 


#  Block 


Node  INTERIM. X  is  local  to  block  3 


Mao  Program 

PROGRAM  MM( 

INFILE  l_As  STREAM  STREAM  INTEGER 
; INPILE2_B :  STREAM  STREAM  INTEGER 
)« 

OCJTPILE_C8  STREAM  STREAM  INTEGER 

; 

OUTPILELC  s-  DECLARE 
OCITPILE_C8  STREAM  INTEGER  ; 

I_l:  INTEGER; 

FOR  EACH  I_1  IN  PROLIP(  10 )  00 
OUTPILRjC  s-  DECLARE 
INTERHL.X8  INTEGER  ; 

OOTPILE_Cs  INTEGER  ; 

I_2:  INTEGER; 

FOR  EACH  I_2  IN  PR0LIP(  10 )  DO 
IWTERIM_X  s m  DECLARE 
INTERIM_X;  INTEGER  ; 

I_3  s  INTEGER; 

FOR  EACH  I_3  IN  PRDLIF( 10 )  00 

INTERIM_X  8-  INFILEl^Af I_l, I_3] *INFILE2_B[ I  3,I_2]; 
RETURN  ALL  INTERIM_X  ; 

OCITPILE_C  8-  +!  INTERIM_X[SUB3,SUB2]; 

RETURN  ALL  OUTFILE_C  ; 

RETURN  ALL  OUTFTLE_C  ; 

RETURN  OUTPIM_C  ; 


C.  2  TRAPEZOIDAL  INTEGRATION 


In  this  example,  the  area  under  a  curve  in  an  interval  [A, B] 
is  computed  using  the  trapezoidal  method.  Input  parameters  are 
the  coefficients  of  the  quadratic  function  to  be  integrated;  the 
interval  limits  A  and  B;  the  width  of  each  subinterval  H;  and 
the  number  of  subintervals  N.  The  interim  array  X  holds  the  X 
values  for  each  subinterval.  The  array  P  holds  the  corresponding 
function  values.  The  area  is  computed  in  array  S.  This 
specification,  the  FFT,  and  the  explanation  of  the  FFT  were 
written  by  nr.  Chi-Ming  Chen  of  the  University  of  Pennsylvania. 
I  am  grateful  for  his  help. 
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MODULE;  INTEG; 

SOURCE;  INPIUS; 

TARGET;  OUTPIIE; 

INPILE  IS  PILE  (INREC); 

INREC  IS  RECORD  (IN(6>); 

IN  IS  PIEUD  (NUMERIC); 

A  IS  FIELD  ( NUMERIC); 

B  IS  FIELD  (NUMERIC); 

H  IS  FIELD  (NUMERIC); 

N  IS  FIELD  (NUMERIC); 
COEFF1  IS  FIELD  (NUMERIC); 
C0EPP2  IS  FIELD  (NUMERIC); 
COEFF3  IS  FIELD  (NUMERIC); 

SI  IS  GROUP  ( S( 1 ; 100 ) ); 

S  IS  FIELD  (NUMERIC); 

FI  IS  GROUP  (P(  1:100)); 

F  IS  FIELD  (NUMERIC); 


XI  IS  GROUP  (X(*)) 


X  IS  FIELD  (NUMERIC); 
PA  IS  FIELD  (NUMERIC); 


OUTPILE  IS  FILE  (OUTREC); 

OUTREC  IS  RECORD  (OUT); 

OUT  IS  FIELD  (NUMERIC); 

I  IS  SUBSCRIPT  IOO); 

A-IN(l); 

H-IN(2); 

N-IN(3); 

COEPFl-IN(  4); 

COEFF2— IN(  5 ) ; 

COEFF3-IN(6); 

PA  -  COEFFI  *  A  *  A  COEFF2  *  A  +  COEPP3; 

S(I)  ■  IP  I  ■  1  THEN  (PA  +  P(N)  )/2 
ELSE  S(I-l)  +  P(I-l); 

F( I )  -  COEFFI  *  X(  I )  *  X( I )  +  COEFF2  *  X(  I )  +  COEFF3 
X( I )  -  IP  I  -  1  THEN  A  +  H 
ELSE  X(  1-1 )  4  H; 

OUT  -  S(  N); 


END  INTEC; 


Block  Description 


Block  l 

SIMP  Level t  0  Range:  0  #  Data  nodes:  11  #  Block 

Members :  11 
Data  Nodes: 

INTERIM. CO EFF 3  for  dimension  0  in  block  1 

INTERIM.  COEFF 2  for  dimension  0  in  block  2 

INTERIM. COEFF1  for  dimension  0  in  block  3 

INTERIM. N  for  dimension  0  in  block  4 

INTERIM. H  for  distension  0  in  block  5 

INTERIM. A  for  dimension  o  in  block  6 

INTERIM. FA  for  dimension  0  in  block  7 
INTERIM. X  for  dimension  o  in  block  8 

INTERIM.  P  for  dinension  0  in  block  9 

INTERIM.  S  for  dinension  0  in  block  10 

OUTPILE.OUT  for  dinension  O  in  block  11 
Block  Members: 

AASS380  ASSERTION 
AASS370  ASSERTION 
AASS360  ASSERTION 
AASS350  ASSERTION 
AASS340  ASSERTION 
AASS330  ASSERTION 
AASS390  ASSERTION 

2  BLOCK 

3  BLOCK 

4  BLOCK 

AASS450  ASSERTION 
Block  2 

ITER  Level:  1  Range:  1  •  Data  nodes:  1  *  Block 

Members:  1 
Data  Modes: 

INTERIM.  X  for  dimension  1  in  block  1 
Block  Members: 

AASS430  ASSERTION 
Block  3 

PARA  Level:  1  Range:  1  #  Data  nodes:  1  4  Block 

Members :  1 

Data  Nodes: 

INTERIM.  F  for  dimension  1  in  block  1 
Block  Members: 

AASS420  ASSERTION 
BloCk  4 

ITER  Level:  1  Range:  1  #  Data  nodes:  1  #  Block 

Members:  1 
Data  Nodes: 

INTERIM. S  for  dimension  1  in  block  1 


MaO  Progr 


PROGRAM  INTEG< 

IMFILE_IN:  STREAM  INTEGER 

)s 

OUTPILE_OUT:  INTEGER 
» 

DECLARE 

INTERIHJC:  STREAM  INTEGER; 

CMTERIM_S:  STREAM  INTEGER; 

INTERIM.P:  STREAM  INTEGER; 

INTERIK.B:  INTEGER; 

INTERIM_CDEFF3  INFILE_IN[ 6 ]  ; 

INTERIM_COEPF2  INPILE_IN[5] ; 

INTERIK_COEFPl  INFILE_IN[4]; 

INTERIM_N  INFILE_IN[3]; 

INTERIM_H  INFIIE_IN[ 2 ] ; 

INTERIM.A  INPILE_IN[ I] ; 

INTERIM. PA  !-  INTERIM_COEPFI*INTERIM_A*INTERIM_A+ 

INTERIM_COEPP2  « INTERIK_A+INTERIM_COEPP3 ; 
INTERIHJC  !-  DECLARE 
INTERIMJC:  INTEGER  ; 

I_l:  INTEGER; 

IMIT 

INTERIM_Xi-0; 

WHILE  I_l<-100  DO 
MEN  I_l:-I_l+1; 

MEN  INTERIM_X  I-  IP  I_l-1  THEN  INTERIM. A+  I  WTERIM.H 
ELSE  INTERIHLX+INTERIMLH; 

RETURN  ALL  NEW  INTERIM_X  ; 

INTERIH_P  DECLARE 

IMTERIK_P:  INTEGER  ; 

L_l:  INTEGER; 

POR  EACH  I_1  IN  PRDLIP( 100 )  DO 

INTERIM.P  INTERIK_C0EFP1*INTERIM_X[  I_1  ]  *INTERIMJC[ I_l] 

+INTERIM_COEFP2*INTERIM_X[  I_l]+INTERIM_COEPP3 
RETURN  ALL  INTERIM_P  ; 

INTERIH.S  DECLARE 

INTERIMJSt  INTEGER  ; 

I_1 :  INTEGER; 

IMIT 

I_l<-1; 

INTERIM_S:-0; 

WHXIE  I_l«-100  DO 
NEW  I_l:-I_l+X; 

NEW  INTERIM. S  «-  IP  I_l-1  THEN 

( INTERIM_PA+INTERIM_P[ INTERIM.N] )/2 
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C .  3  PAST  FOURIER  TRANSFORM 

This  modules  specifies  the  computation  of  the  Fourier  Transform 
of  N-16  sample  data  taken  from  the  function 
f(t)  -  ( e*-t )  sin( t ) 

with  the  sampling  interval  T  *  0.375.  The  FORTRAN  version  on  the 
program  can  be  found  in  Appendix  B  of  [Stea75] . 

The  purpose  of  the  PFT  is  to  calculate  the  DFT  using  N  log  N 
multiplications  rather  than  N*2  multiplications.  Where  N  is  the 
mister  of  sample  data  points. 

Let  f(t)  be  the  time  function  for  the  sample  data  and  fn  be 
the  nth  sample.  Then  the  OFT  of  the  data  i&  given  by 
N-l 

Pm  -  Sum(  fn  *  e~  <-j  )*(  2*pi*»*n/N)  m-0  ..  N-l 

n-0 

N-l 

-  Sum(fn  *  NN(mn)  )  Where  WN  -  e*  (-j  )*(2*pi/N) 

n— O 

Because  of  the  cyclic  property  of  the  exponential  function 
NNk  -  WN(c— N+k)  Where  c  is  any  integer 

Ms  can  find  (Pm  |  0<-m<-N-l}  together  by  q  decompositions  if  we 
choose  N  *  2Aq.  For  simplicity,  the  method  is  explained  using  N 
*  2*3 .  The  method  described  is  that  of  computing  the  PFT  using 
time  decomposition  witn  input  bit  reversal. 
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In  order  to  reduce  the  number  of  multiplications,  the 
sequence  of  inputs  should  be  reordered,  for  example  by  the  input 
bit  reversal  method.  Iftis  is  illustrated  in  Figure  C.l. 

input  01234567 

pattern  OOO  001  010  011  100  101  110  111 

reversal  OOO  loo  oio  110  001  101  oil  111 
result  04261537 

Figure  C.l  Input  Bit  Reversal 

Using  the  obtained  sequence,  we  can  calculate  {Fm}.  Each  Fm  is 
calculated  through  2  intermediate  stages ,  Gil  and  Gi2 ,  1=1  . .  8 . 

Por  example, 

Gil  -  f o  +  WBO  f4 
G21  =  f 0  +  W84  f4 
G31  -  f2  +  WBO  f6 
G41  »  £2  +  W84  f 6 


and  the  result  for  the  second  stage  is  given  by 
G12  -  Gil  +  W80  *  G31 
G22  -  G21  +  W82  *  G41 
G32  -  f2  +  W84  *  G31 


G42  -  f 2  +  W86  *  G41 
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That  is,  the  node  is  obtained  by  two  nodes  in  the  previous  stage 
and  some  power  of  WN. 

In  the  MODEL  specification,  BRS( I )  gives  the  input  bit 
reversal  sequence .  WR( I )  and  WI(  I )  give  the  real  and  imaginary 

parts  respectively  of  WNk,  k  =  0,  l,  _  ,  N-l.  WP(J,  I)  gi/es 

the  power  k  in  WNk  for  the  Ith  node  in  the  Jth  stage.  GPl(J,  I) 
and  GP2( J, I )  give  the  first  and  second  node  numbers  respectively 
from  the  (J-l)th  stage  for  the  Ith  node  in  the  Jth  stage. 
GR(J,  I)  and  GI(J,  I)  give  the  real  and  imaginary  parts 
respectively  of  the  values  for  the  Ithe  node  at  stage  J.  Then, 
finally,  GR( log  N, I )  and  GI(  log  N, I >  are  the  real  and  imaginary 
parts  of  the  Pourier  transform  PI. 
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MODULE:  FFTMOO; 

TARGET  :  0UTFILE1,  OUTPILE2; 

0UTFILE1  IS  FILE(OUTRECl(16)); 

0UTREC1  IS  RECORD  (OUTR); 

OUTFILE2  IS  PILE(0UTREC2(  16)); 

0UTREC2  IS  RECORD  (OUTI); 

(OOTR.OUTI)  IS  FIELD  (BIN  FIXED); 

GO  IS  GROUP  ( GOREC( 16  )  ) ; 

GOREC  IS  GROUP  (T,P,ARG,WR,WI.FR,BRS); 

(T,F,ARG,WR,WI,FR)  IS  FIELD  (BINARY  FLOAT); 
BRS  IS  FIELD  (BIN  FIXED); 

G1  IS  GROUP  (G1REC1(4)); 

G1REC1  IS  GROUP  (P,Q); 

(P,Q)  IS  FIELD  (BIN  FIXED); 

G2  IS  GROUP  ( G2REC1( 4) ); 

G2REC1  IS  GROUP  (G2R£C2( 16 ) ); 

G2REC2  IS  GROUP  (  BR, WP,GP1,GP2,GR,GI ); 

( BR, WP, GP1, GP2 )  IS  FIELD  (BIN  PIXED)  ; 
(GR.GI)  IS  FIELD  (BINARY  FLOAT)  ; 

(  NB,  N, NH )  IS  PIELD  (BIN  FIXED); 

I  IS  SUBSCRIPT  (16); 

J  IS  SUBSCRIPT  (4)  ; 

NB— 4;  \ 

N-2»*NB; 

NH-N/2; 

P(J)-2**J; 

Q(J)-2**(J-i); 

T(  I  V=0.375*(  I— l ); 

F(  I  >-EXP(  -T(  I  )  ) *SSIN(  T(  I  )  ); 

BR(J,I)=  IF  J-l  THEN  MOD(  1-1,2 )*NH 

ELSE  (MOD(  I-1,P(  J))/P(  J-1))*(N/P(  J) 
BRS(  I)-SUM(BR(  J,  I),  J)  ; 

PR(  I  )-  F(BRS(I)+1)  ; 

ARG{ I )“3 . 1415926535*(  1— I )/NH  ; 

NR( I )-  IF  I  <-  NH  THEN  CCOS( ARG( I ) )  ELSE  -WR( 1-8 )  ; 
WI(I)-  IF  I  <-  NH  THEN  SSIN(ARG(I))  ELSE  -WI(  1-8 )  ; 
NP(  J,I)-  MOD(  (  I— 1 ),  P(  J)  )*(  N/P(  J)  )+l; 

GP1(J,I)-  IF  MOD(  I— 1,  P(  J  )  )  <  Q(  J)  THEN  I 

ELSE  I-Q(J)  ; 

GP2(«J,  I  )m  GP1(  J,  I  )  +  Q(J)  ; 

GR(  J,  I  )-  IF  J-l  THEN  IF  MOD(  1,2)  1  THEN  FR(  I  )  + 
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Block  Description: 

Block  1 

SIMP  Level:  0  Range:  0  #  Data  nodes: 

Members :  13 
Data  Modes: 

INTERIM. T  for  dimension  0  in  block  1 
INTERIM.  P  for  dimension  0  in  block  1 
INTERIM. NB  for  dimension  0  in  block  2 
INTERIM.  N  for  dimension  0  in  block  3 
INTERIM. NH  for  dimension  0  in  block  4 
INTERIM. ARC  for  dimension  0  in  block  S 
INTERIM.  WR  for  dimension  o  in  block  6 
INTERIM. NI  for  dimension  o  in  block  7 
INTERIM. Q  for  dimension  0  in  block  8 
INTERIM.  P  for  dimension  0  in  block  8 
INTERIM. NP  for  dimension  0  in  block  8 
INTERIM. GPl  for  dimension  0  in  block  8 
INTERIM.br  for  dimension  0  in  block  9 
INTERIM. BRS  for  dimension  0  in  block  9 
INTERIM.FR  for  dimension  0  in  block  9 
INTERIM. GP2  for  dimension  0  in  block  10 
INTERIM.  GR  for  dimension  0  in  block  11 
INTERIM.  GI  for  dimension  o  in  block  11 
QUTPILEl.OUTR  for  dimension  O  in  block  12 
0UTFILE2.0UTI  for  dimension  0  in  block  13 
Block  Members: 

2  BLOCK 

AASS290  ASSERTION 
AASS300  ASSERTION 
AASS310  ASSERTION 

3  BLOCK 

4  BLOCK 

5  BLOCK 

6  BLOCK 
9  BLOCK 

12  BLOCK 
14  BLOCK 

17  BLOCK 

18  BLOCK 
BloCk  2 

PARA  Level:  1  Range:  1  •  Data  nodes: 

Maebera :  2 

Data  Nodes: 

INTERIM. T  for  dimension  1  in  block  1 
INTERIM.  P  for  dimension  1  in  block  2 
Block  Members: 

AASS340  ASSERTION 


#  Data  nodes:  20  *  Block 


0  in  block  7 
0  in  block  8 
0  in  block  8 
0  in  block  8 
n  0  in  block  8 
o  in  block  9 
n  0  in  block  9 
0  in  block  9 
*  0  in  block  10 
0  in  block  11 
0  in  block  11 
Lon  O  in  block  12 
Lon  0  in  block  13 


•  Block 
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AASS350  ASSERTION 
Block  3 

PARA  Level:  1  Range:  1  #  Data  nodes:  l  *  Block 

Mesbers:  1 
Data  Nodes : 

INTERIM. ARC  for  dimension  1  in  block  1 
Block  Mesbers: 

AASS400  ASSERTION 
Block  4 

ITER  Level:  1  Range:  1  t  Data  nodes*.  1  #  Block 

Meebers:  1 
Data  Nodes: 

INTERIM. NR  for  dimension  1  in  block  1 
Block  Mesbers: 

AASS410  ASSERTION 
Block  5 

ITER  Level:  1  Range:  1  •  Data  nodes:  l  #  Block 

Mesbers:  1 
Data  Nodes: 

INTERIM. WI  for  dimension  1  in  block  1 
Block  Mesbers: 

AASS420  ASSERTION 
Block  6 

PARA  Level:  1  Range:  2  •  Data  nodes:  4  *  Block 

Mesbers:  4 
Data  Nodes: 

INTERIM.  Q  for  dimension  1  in  block  1 
INTERIM.  P  for  dimension  1  in  block  2 
INTERIM.  WP  for  dimension  2  in  block  3 
INTERIM. GP1  for  dimension  2  in  block  4 
Block  Mesbers: 

AASS330  ASSERTION 
AASS320  ASSERTION 

7  BLOCK 

8  BLOCK 
Block  7 

PARA  Level:  2  Range:  1  •  Data  nodes:  1  •  Block 

Mesbers:  1 
Data  Nodes: 

IMTERIM.NP  for  dimension  1  in  block  1 
Block  Mesbers: 

AASS430  ASSERTION 
Block  8 

PARA  Level:  2  Range:  1  #  Data  nodes:  1  #  Block 

Mesbers:  1 
Data  Nodes: 

INTERIM.  GP1  for  dimension  1  in  block  1 
Block  Mesbers: 


AASS440  ASSERTION 
Block  9 

PARA  Level:  1  Range:  1  #  Data  nodes:  3  #  Block 

Heaters:  3 
Data  Nodes: 

ClfTERIN.BR  for  dimension  1  in  block  1 
INTERIM.  BRS  for  dimension  1  in  block  2 
INTERIM.  PR  for  dimension  1  in  block  3 
Block  Heaters: 

10  BLOCK 

11  BLOCK 

AASS390  ASSERTION 
Block  10 

PARA  Level:  2  Range:  2  *  Data  nodes:  1  #  Block 

Heaters:  1 
Data  Nodes: 

INTERIM.  BR  for  distension  2  in  block  1 
Block  Heaters: 

AASS360  ASSERTION 
Block  11 

PARA  Level:  2  Range:  2  #  Data  nodes:  1  *  Block 

Heaters:  1 
Data  Nodes: 

INTERIM.  BRS  for  distension  -1  in  block  1 
Block  Heaters: 

AASS3BO  ASSERTION 
Block  12 

PARA  Level:  1  Range:  1  t  Data  nodes:  1  #  Block 

Heaters:  1 
Data  Nodes: 

INTERIM. GP 2  for  distension  1  in  block  1 
Block  Heaters: 

13  BLOCK 
Block  13 

PARA  Level:  2  Range:  2  t  Data  nodes:  1  *  Block 

Heaters:  1 
Data  Nodes: 

INTERIM.  GP2  for  distension  2  in  block  1 
Block  Heaters: 

AASS460  ASSERTION 
Block  14 

ITER  Level:  1  Range:  2  ff  Data  nodes:  2  t  Block 

Hesters:  2 
Data  Modes: 

INTERIM.GR  for  distension  2  in  block  1 

INTERIM. GI  for  distension  2  in  block  2 

Block  Heaters: 

15  BLOCK 


16  BLOCK 
Block  15 

PARA  Level:  2  Range:  1  #  Data  nodes:  l  #  Block 

Meabers :  1 

Data  Nodes: 

1NTERZM.GR  for  dissension  1  In  block  1 
Block  Meabers: 

AASS470  ASSERTION 
Block  16 

PARA  Level:  2  Range:  1  *  Data  nodes:  1  *  Block 

Meabers :  l 
Data  Nodes: 

INTERIM. GI  for  dimension  1  in  block  1 
Block  Meabers : 

AASS510  ASSERTION 
Block  17 

PARA  Level:  1  Range:  1  t  Data  nodes:  1  #  Block 

Meabers:  1 
Data  Nodes: 

OUTFILEl.OUTR  for  dissension  1  in  block  1 
Block  Meabers: 

AASS540  ASSERTION 
Block  18 

PARA  Level:  1  Range:  1  t  Data  nodes:  1  #  Block 

Meabers:  1 
Data  Nodes: 

OOTPILE2.0UTI  for  dissension  1  in  block  1 
Block  Meabers: 

AASS550  ASSERTION 
Local  Data  Nodes 

Node  INTERIM.  NH  is  local  to  block  1 

Node  INTERIM. NB  is  local  to  block  1 

Node  interim. N  is  local  to  block  1 
Node  INTERIM.BR  is  local  to  block  9 

Node  INTERIM. NP  is  local  to  block  1 

Node  INTERIM. GP1  is  local  to  block  1 
Node  INTERIM. GP 2  is  local  to  block  1 
Node  INTERIM. Q  is  local  to  block  1 
Node  INTERIM. T  is  local  to  bloCk  2 
Mode  INTERIM. ARC  is  local  to  block  1 
Node  INTERIM. BRS  is  local  to  block  9 


Mao  Progr< 


PROGRAM  FPTHXK 

>*C 

OUTPILEl_OUTR:  STREAM  INTEGER 
;OOTFILE2_OOTI:  STREAM  INTEGER 
1# 

DECLARE 

INTERIH_GR:  STREAM  STREAM  REAL; 

INTERIM-GI:  STREAM  STREAM  REAL; 

INTERIHJ?:  STREAM  INTEGER; 

INTERIM.F:  STREAM  REAL; 

INTERIM-WR:  STREAM  REAL; 

INTERIM-WI :  STREAM  REAL; 

INTERIK_FR:  STREAM  REAL; 

ENTERIH_F  DECLARE 

INTERIM-T:  REAL  ; 

INTERIM-F:  REAL  ; 

I_l:  INTEGER; 

FOR  EACH  I_1  IN  PROLEF(  16 )  DO 
INTERIK_T  0.375*(  I_l-1); 

INTERIM_P  EXP( ( -INTERIM-T) )*SS I N(  INTERIM-T); 

RETURN  ALL  INTERIK_F  ; 

INTERIH_NB  j-  4; 

INTERIH.N  2**INTERIM_NB; 

INTERIM-NH  INTERIM.N/2; 

INTERIM-ARC  DECLARE 

INTERIM-ARC:  REAL  ; 

I_1 :  INTEGER; 

FOR  EACH  I_1  IN  PROLIF( 16 )  DO 

INTERIM-ARC  s-  3.1415926535*( 1-I_1 )/INTERIM_NH; 

RETURN  ALL  INTERIM-ARC  ; 

INTERIM-WR  DECLARE 

INTERIM_WR:  REAL  ; 

I_ls  INTEGER; 

INIT 
I_1  j-1; 

INTERIM-WR:  -O; 

WHILE  I_l<-16  DO 
MEN  I_l:-I_l+1; 

NEW  INTERIM-WR  :-  IF  I_l<- INTERIM-NH  THEN  CCOS(  INTERIM- ARG[  I_H  ) 

ELSE  -INTERIM-WRt I_l-8]; 

RETURN  ALL  NEW  INTERIM-WR  ; 

INTERIM, WI  :-  DECLARE 
IITIERIM-WI:  REAL  ; 

I_l:  INTEGER; 

INIT 

I_l:-1;  J 


INTERIM_WI:-0; 

WHILE  1.1 <-16  00 
NEW  I_l»-I_l+1; 

NEW  INTERIM.WI  IP  I_1<-INTERIM_NH  THEN  SSEN(  INTERIM_ARG[  I_l]  ) 
ELSE  -rNTERIK_Wr[I_l-8]; 

RETURN  ALL  NEW  INTERIM.WI  ; 

[NTERIK.Q,  INTERIH.P,  INTERIK-WP,  INTERIH_GP1]  :«  DECLARE 
INTERIM.Q :  INTEGER  ; 

INTERIM.?:  INTEGER  ; 

INTERIK-WP:  STREAM  INTEGER  ; 

INTERIM-GPI:  STREAM  INTEGER  ; 

I_2 :  INTEGER; 

FOR  EACH  1.2  IN  PR0LIP(4)  DO 
INTERIM.*}  «-  2**( I_2— 1 ); 

INTERIM.?  2**I_2; 

INTERIM.WP  i-  DECLARE 
INTERIM.WP:  INTEGER  ; 

I_1 :  INTEGER; 

FOR  EACH  1.1  IN  PR0LXF( 16 )  DO 

INTERIM.WP  j -  M0D(  (  I.l-l ) ,  INTERIM.P  )*(  INTERIMS/ INTERIM.?  )+l; 
RETURN  ALL  INTERIM. WP  i 
INTERIM.CP1  I-  DECLARE 
IMTERIM_GPli  INTEGER  ; 

K_lt  INTEGER; 

FOR  EACH  1.1  IN  PR0LIF(  16 )  DO 

INTERIM-GP1  l-  IF  MDO<  (  I_l-1 ) ,  INTERIM.P  ) <  INTERIM.Q  THEN  I_1  ELSE 

i_i-interih_Q; 

RETURN  ALL  INTERIH_GP1  ; 

RETURN  C  ALL  INTERIM.*},  ALL  INTERIM-P,  ALL  INTERIM.WP,  ALL 
INTERIM_Ca>l]; 

INTERIM.FR  DECLARE 

INTERIM_BRt  INTEGER  ; 

INTERIM-BRS;  INTEGER  ; 

INTERIM.FR:  REAL  ; 

I_l:  INTEGER; 

FOR  EACH  I_1  IN  PR0LIF( 16 )  DO 
INTERIM_BR  i-  DECLARE 
INTERIM.BR:  INTEGER  ; 

I_2:  INTEGER; 

FOR  EACH  X_2  IN  PR0LIF(4)  DO 

tNTERIM_BR  IF  1.2-1  THEN  NDO<( I.l-l ), 2 )*INTERIM_NH 
ELSE  (  MDD(  (  I_l-1 > , INTERIM_P[ I_2 ] )/ 

INTERIM.? C I_2-l] )*( INTERIM_N/INTERIM_?[ I_2] ); 
RETURN  ALL  INTERIM.BR  ; 

INTERIM.BRS  +!  INTERIM.BRCSUB1] ; 

INTERIM.FR  i-  XWTERXM_P( INTERIM.BRS+1  ] ; 

RETURN  ALL  INTERIM.FR  ; 

INTERIM.GP2  >-  DECLARE 


INTERIH_GP2 :  STREAM  INTEGER  ; 

I_1 :  INTEGER; 

FOR  EACH  I_1  IN  PROLIP(  16 )  00 
INTERIK_GP2  s-  DECLARE 
INTERIM_GP2 :  INTEGER  ; 

I_2:  INTEGER; 

FOR  EACH  I_2  IN  PROLIP(  4)  00 

nrTERIH_GP2  INTERIM_GP1[I_2, 1_l}+INTERIH_QtI_21; 

RETURN  ALL  INTERIM_GP2  ; 

RETURN  ALL  INTERIH_GP2  ; 

[NTERIM_GR,  INTERIM_GI]  :  =  DECLARE 

INTERIH_GR:  REAL  ; 

INTERIH-GIs  REAL 
I_2:  INTEGER; 

IN  IT 
t_2;-l; 

INTERDi_GR:-0; 

INTERIH-GI  1*0; 

WHILE  I_2<-4  DO 
MEN  I_2:-I_2+I; 

MEN  INTKRIM_GR  DECLARE 

IMTERIM_GRt  REAL 
I_l«  INTEGER; 

FOR  EACH  I_I  IN  PR0LIP(  16 )  DO 

INTERIM_GR  :«  IP  I_2«l  THEN  IP  MOO(I_1.2>«1 

THEN  INTERIH_PR(  I„1  ]+ INTERIM, PRf  I_I+1] 

ELSE  INTERIK_PRCI_l-l]-INTERIM_PRtI_I] 

ELSE  INTERIM_GR+INTERIM_WR( INTERIK_WP[ I_2 . I_1 ] ] 
* INTERIH_GR- INTERIK_WI [  INTERIM_WP[  I_2 , 1_ll  ] 
»INTERIK_GI[  INTERIH.GP2C  I_2 , 1_1  ]  ] ; 

RETURN  ALL  INTERIH.GR  ; 

MEN  INTERIK_GI  «-  DECLARE 

INTERIM_GIt  REAL  ; 

I_I;  INTEGER; 

FOR  EACH  I_1  IN  PR0LIP(  16 )  DO 
rNTERIM_GI  i-  IP  I_2-l  THEN  0 

ELSE  INTERIH_GI+ INTERIM_WR[ interim_wpc I_2 . I_1 ] } 
*  INTERIM_GI+  INTERIM-WI C  INTERIM.NPC  I_2 , 1_H  ] 
*INTERIH_GRC  INTERIK_GP2[  I_2 , 1_1  ]  ] ; 

RETURN  ALL  INTERIM_GI  ; 

RETURN  C  ALL  NEW  INTERIHJ3R,  ALL  NEW  INTERIM_GI  ]  ; 
0UTPILE1_0UTR  :«*  DECLARE 
0UTPILE1_0UTR;  INTEGER  ; 

I_l:  INTEGER; 

FOR  EACH  I_1  IN  PR0LIP(  16 )  DO 
OUTPILEUXJTR  i-  INTERIM_GRC INTERIM_NB ,  I_l); 

RETURN  ALL  0UTPILE1_0UTR  ; 

0OTPILE20UTI  »-  DECLARE 
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I -structure,  37 
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stream  data  type,  48 
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Lau  system,  22 
Local  variables ,  154 

Mao,  9,  48 
bnf,  51 
matrices,  54 

multidimensional  structures, 

53 

restrictions,  147 
stored  streams,  53 
structure  types  in,  52 
structured  loops ,  56 
tbe  basic  expression,  57 
Manchester ,  9 

Manchester  data  flow  machine,  40 
data  and  instruction  formats, 

43 

machine  layout,  40 
Matching  store,  22 
Matching  unit,  22 
Matrix,  54 

Merging  components,  118 
criteria  for,  125 
valid  candidates,  125 
Model 

array  graph.  74 
description,  72 
edge  data  structure,  83 
end  array,  66 
iterative  loops,  61 
node  dimensionality,  80 
objectives,  61 
physical  and  virtual  dimensions, 
87 

precedence,  81 
range,  81 
range  sets,  85 


size  array,  65 
underlying  graph,  73 
Model,  3 
Msec,  91 

Node  store ,  22 
Nonprocedural  languages,  58 
common  properties,  58 
environment ,  59 
Lucid ,  60 
Model,  60 
semantics,  59 

Partitioning,  33 
Physical  dimensions,  87 
Problem  areas  in  data  flow,  29 
data  structures,  33 
partitioning  the  program,  32 
reentraney  of  graph,  30 
Processing  unit,  22 
Processor  network  topology,  24 
D0M1,  27 
T£  DOP,  25 

Range,  74 

Referential  transparency,  2 

Schedu le_component ,  loo 
Scrhedule_graph,  99 
Scheduling,  90 

a  simple  algorithm,  97 
asserion  node,  101 
component  graph,  91 
cycles  in  the  array  graph,  105 
description,  91 
distinguished  dimension,  101 
efficiency,  112 
hanging,  105 
initialization,  98 
Msec,  91 


